Skip to content

open-lm-engine/accelerated-model-architectures

Repository files navigation

(Accelerated Model Architectures)

XMA is a repository comprising of fast kernels for model training.
We are planning on adding lots of experimental and fun model architectures with support for multiple accelerators like NVIDIA, AMD GPUs, Google TPUs and Amazon Trainiums.

layers

functional CUDA Pallas NKI ROCm Triton
GRU ❌ ❌ ❌ ❌ βœ…
MoE βœ… ❌ ❌ ❌ βœ…
RNN ❌ ❌ ❌ ❌ βœ…

functional

functional CUDA Pallas NKI ROCm Triton
bmm ❌ ❌ ❌ ❌ βœ…
continuous_count βœ… ❌ ❌ ❌ ❌
cross_entropy ❌ ❌ ❌ ❌ βœ…
fused_linear_cross_entropy ❌ ❌ ❌ ❌ βœ…
fused_residual_add_rmsnorm ❌ ❌ ❌ ❌ βœ…
grouped_gemm βœ… ❌ ❌ ❌ ❌
rmsnorm ❌ ❌ ❌ ❌ βœ…
pack_sequence βœ… ❌ ❌ ❌ βœ…
softmax ❌ ❌ ❌ ❌ βœ…
swiglu βœ… ❌ ❌ ❌ βœ…
swiglu_packed βœ… ❌ ❌ ❌ βœ…
unpack_sequence βœ… ❌ ❌ ❌ βœ…

Discord Server

Join the discord server if you are interested in LLM architecture or distributed training/inference research.

About

A bunch of kernels that might make stuff slower πŸ˜‰

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5