llm-d is a well-lit path for serving large language models at scale with the fastest time-to-value and competitive performance per dollar. Built on vLLM, Kubernetes, and Inference Gateway, llm-d provides modular solutions for distributed inference with features like KV-cache aware routing and disaggregated serving.
- 📖 Documentation: llm-d.ai
 - 🏗️ Architecture: llm-d architecture docs
 - 📖 Project Details: PROJECT.md
 - 📦 Releases: GitHub Releases
 
- 💬 Slack: Join our development discussions at llm-d.slack.com
 - 📧 Google Group: Subscribe to llm-d-contributors for architecture docs and meeting invites
 - 🗓️ Weekly Standup: Wednesdays at 1230 ET - Public Calendar
 
- Read Guidelines: Review our Code of Conduct and contribution process
 - Sign Commits: All commits require DCO sign-off (
git commit -s) 
- 🐛 Bug fixes and small features - Submit PRs directly to component repos
 - 🚀 New features with APIs - Require project proposals
 - 📚 Documentation - Help improve guides and examples
 - 🧪 Testing & Benchmarking - Contribute to our test coverage
 - 💡 Experimental features - Start in llm-d-incubation org
 
License: Apache 2.0