lokimedes 4 days ago

There’s a couple of “concerns” you may separate to make this a bit more tractable:

1. Learning CUDA - the framework, libraries and high-layer wrappers. This is something that changes with times and trends.

2. Learning high-performance computing approaches. While a GPU and the Nvlink interfaces are Nvidia specific, working in a massively-parallel distributed computing environment is a general branch of knowledge that is translatable across HPC architectures.

3. Application specifics. If your thing is Transformers, you may just as well start from Torch, Tensorflow, etc. and rely on the current high-level abstractions, to inspire your learning down to the fundamentals.

I’m no longer active in any of the above, so I can’t be more specific, but if you want to master CUDA, I would say learning how massive-parallel programming works, is the foundation that may translate into transferable skills.

2
david-gpu 4 days ago

Former GPU guy here. Yeah, that's exactly what I was going to suggest too, with emphasis on #2 and #3. What kind of jobs are they trying to apply for? Is it really CUDA that they need to be familiar with, or CUDA-based libraries like cuDNN, cuBLAS, cuFFT, etc?

Understanding the fundamentals of parallel programming comes first, IMO.

chanana 4 days ago

> Understanding the fundamentals of parallel programming comes first, IMO.

Are there any good resources you’d recommend for that?

rramadass 4 days ago

I am not the person you asked the question of, but you might find the following useful (in addition to the ones mentioned in my other comments);

Foundations of Multithreaded, Parallel, and Distributed Programming by Gregory Andrews - An old classic but still very good explanations of concurrent algorithmic concepts.

Parallel Programming: Concepts and Practice by Bertil Schmidt et.al. - A relatively recent book with comprehensive coverage.

rramadass 4 days ago

This is the right approach. Without (2) trying to learn (1) will just lead to "confusion worse confounded". I also suggest a book recommendation here - https://news.ycombinator.com/item?id=44216478

jonas21 4 days ago

I think it depends on your learning style. For me, learning something with a concrete implementation and code that you can play around with is a lot easier than trying to study the abstract general concepts first. Once you have some experience with the code, you start asking why things are done a certain way, and that naturally leads to the more general concepts.

rramadass 4 days ago

It has got nothing to do with "learning styles". Parallel Computing needs knowledge of three things; a) Certain crucial architectural aspects (logical and physical) of the hardware b) Decomposing a problem correctly to map to that hardware c) Algorithms using a specific language/framework to combine the above two. CUDA (and other similar frameworks) only come in the last step and so a knowledge of the first two is a prerequisite.

lokimedes 4 days ago

This one was my go-to for HPC, but it may be a bit dated by now: https://www.amazon.com/Introduction-Performance-Computing-Sc...

rramadass 4 days ago

That's a good book too (i have it) but more general than the Ridgway Scott book which uses examples from Numerical Computation domains. Here is an overview of the chapters; example domains start from chapter 10 onwards - https://www.jstor.org/stable/j.ctv1ddcxfs

These sort of books are only "dated" when it comes to specific languages/frameworks/libraries. The methods/techniques are evergreen and often conceptually better explained in these older books.

For recent up to date works on HPC, the free multi-volume The Art of High Performance Computing by Victor Eijkhout can't be beat - https://news.ycombinator.com/item?id=38815334