GPUs have long been compute monsters. But for a new class of applications with huge datasets and low computational intensity, like generative neural network (GNN) training and retrieval-augmented generation (RAG), GPU-initiated communication from O(100K) GPU threads enable them to also be data access monsters. We'll frame the problem of storage IO, tie it to specific usage models and classes of applications, present breakthrough successes, and open new horizons to practical applications where storage IO optimization is needed for files and objects and both fine- and coarse-grained transfers. We'll cover two members of the GPUDirect Storage family, cuFile and cuObj, and SCADA, a new programming model for high-throughput, fine-grained, GPU-initiated access. We'll also touch on security issues and how they can be addressed through careful design and, in some cases, the help of DPUs.
Key Takeaways:
- GPUs consume datasets from storage because they're too big to fit in memory
- Removing CPU overheads is enabled with direct transfers between storage and GPU, sometimes initiated by the GPU
- New app classes such as GNN training and RAG, and new usage models such as checkpoint save/restore, are driving this
- New entrants to CUDA include cuObj in GPUDirect Storage and SCADA, for scaled accelerated data access
- Enhanced security is achieved with careful architectural design and, in some cases, with the help of DPUs
Presenters:
- CJ Newburn, Distinguished Engineer, NVIDIA
- Prashant Prabhu, NVIDIA
- Vikram Sharma Mailthody, Senior Research Scientist, NVIDIA