XMA is a repository comprising of fast kernels for model training.
We are planning on adding lots of experimental and fun model architectures with support for multiple accelerators like NVIDIA, AMD GPUs, Google TPUs and Amazon Trainiums.
| functional | CUDA | Pallas | NKI | ROCm | Triton |
|---|---|---|---|---|---|
| GRU | β | β | β | β | β |
| MoE | β | β | β | β | β |
| RNN | β | β | β | β | β |
| functional | CUDA | Pallas | NKI | ROCm | Triton |
|---|---|---|---|---|---|
| bmm | β | β | β | β | β |
| continuous_count | β | β | β | β | β |
| cross_entropy | β | β | β | β | β |
| fused_linear_cross_entropy | β | β | β | β | β |
| fused_residual_add_rmsnorm | β | β | β | β | β |
| grouped_gemm | β | β | β | β | β |
| rmsnorm | β | β | β | β | β |
| pack_sequence | β | β | β | β | β |
| softmax | β | β | β | β | β |
| swiglu | β | β | β | β | β |
| swiglu_packed | β | β | β | β | β |
| unpack_sequence | β | β | β | β | β |
Join the discord server if you are interested in LLM architecture or distributed training/inference research.
