C++/MPI proxies for distributed training of deep neural networks, including ResNet-50, ResNet-152, BERT-large, CosmoFlow, DLRM, GPT-2, GPT-3, etc. These proxies cover data parallelism, operator parallelism, pipeline parallelism, MoE, and hybrid parallelism with a combination of the aforementioned parallel modes.
Compile:
mpicxx gpt2_large.cpp -o gpt2
Run:
mpirun -n 32 ./gpt2
Setup the number of Transformer layers and the number of pipeline stages:
mpirun -n 32 ./gpt2 64 8