Repository with talks and exercises of our Efficient GPU Programming for Exascale tutorial, to be held at SC25.
- Date: 16 November 2025
- Occasion: SC25 Tutorial
- Tutors: Simon Garcia de Gonzalo (SNL), Andreas Herten (JSC), Lena Oden (Uni Hagen), David Appelhans (NVIDIA); with support by Markus Hrywniak (NVIDIA) and Jiri Kraus (NVIDIA)
The tutorial is an interactive tutorial with introducing lectures and practical exercises to apply knowledge. The exercises have been derived from the Jacobi solver implementations available in NVIDIA/multi-gpu-programming-models.
Walk-through (only possible on-site at SC25!):
- Sign up at JuDoor
- Open Jupyter JSC: https://jupyter.jsc.fz-juelich.de
- Create new Jupyter instance on JUPITER, using training2555 account, on LoginNode
- Source course environment: source $PROJECT_training2555/env.sh
- Sync material: jsc-material-sync
- Locally install NVIDIA Nsight Systems: https://developer.nvidia.com/nsight-systems
Curriculum (Note: square-bracketed sessions are skipped at ISC25 because only ½ day was allocated to the tutorial):
- Lecture: Tutorial Overview, Introduction to System + Onboarding Andreas
- Lecture: MPI-Distributed Computing with GPUs Simon
- Hands-on: Multi-GPU Parallelization
- Lecture: Performance / Debugging Tools David
- Lecture: Optimization Techniques for Multi-GPU Applications Simon
- Hands-on: Overlap Communication and Computation with MPI
- Lecture: Overview of NCCL and NVSHMEN in MPI Lena
- Hands-on: Using NCCL and NVSHMEM
- Lecture: Device-initiated Communication with NVSHMEM David
- Hands-on: Using Device-Initiated Communication with NVSHMEM
- Lecture: Conclusion and Outline of Advanced Topics Andreas