A production-ready cookie-cutter template for building MCP servers with LangGraph's Functional API. Features comprehensive authentication (JWT), fine-grained authorization (OpenFGA), secrets management (Infisical), and OpenTelemetry-based observability.
🎯 Opinionated, production-grade foundation for your MCP server projects.
# Generate your own MCP server project
uvx cookiecutter gh:vishnu2kmohan/mcp_server_langgraph
# Answer a few questions and get a fully configured project!
See Cookiecutter Template Strategy for detailed information.
For: Creating your own MCP server with custom tools and logic
How:
- Generate project:
uvx cookiecutter gh:vishnu2kmohan/mcp_server_langgraph
- Customize tools in generated
agent.py
- Update authorization model in
scripts/setup/setup_openfga.py
- Deploy your custom server
What gets customized:
- Project name, author, license
- Which features to include (auth, observability, deployment configs)
- LLM provider preferences
- Tool implementations
See: Cookiecutter Template Strategy (ADR-0011)
For: Learning, testing, or using the reference implementation
How:
- Clone:
git clone https://github.com/vishnu2kmohan/mcp-server-langgraph.git
- Install:
uv sync
- Configure: Copy
.env.example
to.env
and add API keys - Run:
make run-streamable
What you get:
- Fully working MCP server with example tools (
agent_chat
,conversation_search
,conversation_get
) - Complete observability stack
- Production-ready deployment configs
- Comprehensive test suite
See: Quick Start below
This project achieves reference-quality implementation of Anthropic's AI agent best practices:
- 🎯 Just-in-Time Context Loading: Dynamic semantic search with Qdrant vector database
- Load only relevant context when needed (60% token reduction)
- Progressive discovery through iterative search
- Token-aware batch loading with configurable budgets
- ⚡ Parallel Tool Execution: Concurrent execution with automatic dependency resolution
- 1.5-2.5x latency reduction for independent operations
- Topological sorting for correct execution order
- Graceful error handling and recovery
- 📝 Enhanced Structured Note-Taking: LLM-based 6-category information extraction
- Automatic categorization: decisions, requirements, facts, action_items, issues, preferences
- Context preservation across multi-turn conversations
- Fallback to rule-based extraction for reliability
- ✅ Complete Agentic Loop: Full gather-action-verify-repeat cycle
- Context compaction (40-60% token reduction)
- LLM-as-judge verification (23% quality improvement)
- Iterative refinement (up to 3 attempts)
- Observable with full tracing
See: Anthropic Best Practices Assessment | ADR-0023 | ADR-0024 | ADR-0025 | Examples
- Multi-LLM Support (LiteLLM): 100+ LLM providers - Anthropic, OpenAI, Google, Azure, AWS Bedrock, Ollama
- Open-Source Models: Llama 3.1, Qwen 2.5, Mistral, DeepSeek, and more via Ollama
- LangGraph Functional API: Stateful agent with conditional routing and checkpointing
- MCP Server: Standard protocol for exposing AI agents as tools (stdio, StreamableHTTP)
- Enterprise Authentication: Pluggable auth providers (InMemory, Keycloak SSO)
- JWT Authentication: Token-based authentication with validation and expiration
- Keycloak Integration: Production-ready SSO with OIDC/OAuth2 (integrations/keycloak.md)
- Token Refresh: Automatic refresh token rotation
- JWKS Verification: Public key verification without shared secrets
- Session Management: Flexible session storage backends
- InMemory: Fast in-memory sessions for development
- Redis: Persistent sessions with TTL, sliding windows, concurrent limits
- Advanced Features: Session lifecycle management, bulk revocation, user tracking
- Fine-Grained Authorization: OpenFGA (Zanzibar-style) relationship-based access control
- Role Mapping: Declarative role mappings with YAML configuration
- Keycloak Sync: Automatic role/group synchronization to OpenFGA
- Hierarchies: Role inheritance and conditional mappings
- Secrets Management: Infisical integration for secure secret storage and retrieval
- Feature Flags: Gradual rollouts with environment-based configuration
- Dual Observability: OpenTelemetry + LangSmith for comprehensive monitoring
- OpenTelemetry: Distributed tracing with Jaeger, metrics with Prometheus (30+ auth metrics)
- LangSmith: LLM-specific tracing, prompt engineering, evaluations
- Structured Logging: JSON logging with trace context correlation
- Full Observability Stack: Docker Compose setup with OpenFGA, Keycloak, Redis, Jaeger, Prometheus, Grafana, and Qdrant
- LangGraph Platform: Deploy to managed LangGraph Cloud with one command
- Automatic Fallback: Resilient multi-model fallback for high availability
The project supports optional feature sets that can be installed on demand:
-
Secrets Management (
[secrets]
): Infisical integration for centralized secrets- Install:
pip install -e ".[secrets]"
oruv sync --extra secrets
- Fallback: Environment variables (
.env
file) - Production: Recommended for secure secret rotation
- See: Infisical Installation Guide
- Install:
-
Self-Hosted Embeddings (
[embeddings]
): sentence-transformers for local embedding generation- Install:
pip install -e ".[embeddings]"
oruv sync --extra embeddings
- Fallback: Google Gemini API (langchain-google-genai, installed by default)
- Production: Use API-based embeddings (lower latency, no GPU required)
- Note: Self-hosted embeddings require significant resources
- Install:
-
GDPR Storage Backend: PostgreSQL or Redis for compliance data persistence
- CRITICAL: In-memory storage is NOT production-ready
- Required for: GDPR compliance endpoints (
/api/v1/users/me/*
) - Config: Set
GDPR_STORAGE_BACKEND=postgres
orredis
in production - See: GDPR Storage Configuration
-
All Features (
[all]
): Install all optional dependencies- Install:
pip install -e ".[all]"
oruv sync --all-extras
- Use for: Development, testing, full feature evaluation
- Install:
Development vs Production:
- Development: All features work with fallbacks (in-memory, env vars, API-based)
- Production: Use persistent backends (Redis, PostgreSQL) and proper secret management
- Property-Based Testing: 27+ Hypothesis tests discovering edge cases automatically
- Contract Testing: 20+ JSON Schema tests ensuring MCP protocol compliance
- Performance Regression Testing: Automated latency tracking against baselines
- Mutation Testing: Test effectiveness verification with mutmut (80%+ target)
- Strict Typing: Gradual mypy strict mode rollout (3 modules complete)
- OpenAPI Validation: Automated schema generation and breaking change detection
- 80% Code Coverage: Comprehensive unit and integration tests
- Kubernetes Ready: Production manifests for GKE, EKS, AKS, Rancher, VMware Tanzu
- Helm Charts: Flexible deployment with customizable values and dependencies
- Kustomize: Environment-specific overlays (dev/staging/production)
- Multi-Platform: Docker Compose, kubectl, Kustomize, Helm deployment options
- CI/CD Pipeline: Automated testing, validation, build, and deployment with GitHub Actions
- Deployment Validation: Comprehensive validation scripts for all deployment configurations
- E2E Testing: Automated deployment tests with kind clusters
- High Availability: Pod anti-affinity, HPA, PDB, rolling updates
- Monitoring: 25+ Prometheus alerts, 4 Grafana dashboards, 9 operational runbooks
- Observability: Full monitoring for Keycloak, Redis, sessions, and application
- Secrets: External secrets operator support, sealed secrets compatible
- Service Mesh: Compatible with Istio, Linkerd, and other service meshes
- Architecture Decision Records (ADRs): 25 documented design decisions (adr/)
- Comprehensive Documentation: Complete documentation index with guides, tutorials, and references
- API Documentation: Interactive OpenAPI/Swagger UI
- Documentation Index - Complete guide to all documentation
- API Documentation - Interactive OpenAPI/Swagger UI (when running locally)
- Mintlify Deployment - Mintlify documentation deployment instructions
- Mutation Testing Guide - Test effectiveness measurement and improvement
- Strict Typing Guide - Gradual mypy strict mode rollout
- Architecture Decision Records - Documented architectural choices
- Deployment Quickstart - Quick deployment guide for all platforms
- Deployment README - Comprehensive deployment documentation
- CI/CD Guide - Continuous integration and deployment pipeline
- Keycloak Integration - Enterprise SSO setup and configuration
- 0001: Multi-Provider LLM Support (LiteLLM)
- 0002: Fine-Grained Authorization (OpenFGA)
- 0003: Dual Observability Strategy
- 0004: MCP Transport Selection (StreamableHTTP)
- 0005: Type-Safe Responses (Pydantic AI)
- 0023: Anthropic Tool Design Best Practices
- 0024: Agentic Loop Implementation
- 0025: Anthropic Best Practices - Advanced Enhancements
- See all 25 ADRs
- Examples Directory - Comprehensive examples demonstrating all features
- Dynamic Context Loading - Just-in-Time semantic search
- Parallel Tool Execution - Concurrent execution patterns
- Enhanced Note-Taking - LLM-based information extraction
- Complete Workflow - Full agentic loop demonstration
- Python: 3.10, 3.11, or 3.12
- Memory: 2GB RAM minimum (4GB recommended for production)
- Disk: 500MB for dependencies + 1GB for optional vector databases
- OS: Linux, macOS, or Windows with WSL2
- Redis: Session storage (or use in-memory mode)
- PostgreSQL: Compliance data storage (optional)
- OpenFGA: Fine-grained authorization (optional)
- Qdrant/Weaviate: Vector database for semantic search
- Jaeger: Distributed tracing visualization
- Prometheus + Grafana: Metrics and monitoring
See Production Checklist for detailed requirements.
Using uv (recommended):
This project uses uv for fast, reliable dependency management:
# Install from PyPI
uv pip install mcp-server-langgraph
# Or clone and develop locally (creates virtual environment automatically)
git clone https://github.com/vishnu2kmohan/mcp-server-langgraph.git
cd mcp-server-langgraph
uv sync # Installs all dependencies from pyproject.toml and uv.lock
Why uv?
- ⚡ 10-100x faster than pip
- 🔒 Reproducible builds via uv.lock lockfile
- 📦 Single source of truth in pyproject.toml
- 🛡️ Better dependency resolution
Alternative: Using pip:
# Install from PyPI
pip install mcp-server-langgraph
# Or install from source
git clone https://github.com/vishnu2kmohan/mcp-server-langgraph.git
cd mcp-server-langgraph
pip install -e .
Note: requirements*.txt files are deprecated. Use
uv sync
instead.
python -c "import mcp_server_langgraph; print(mcp_server_langgraph.__version__)"
See Installation Guide for complete instructions, including:
- Docker installation
- Virtual environment setup
- Dependency management
- Configuration options
┌──────────────────────┐
│ MCP Client │
│ (Claude Desktop │
│ or other) │
└──────────┬───────────┘
│
▼
┌──────────────────────────────────────┐
│ MCP Server │
│ (server_stdio.py/streamable.py) │
│ ┌────────────────────────────┐ │
│ │ Auth Middleware │ │
│ │ - JWT Verification │ │
│ │ - OpenFGA Authorization │ │
│ └────────────────────────────┘ │
│ ┌────────────────────────────┐ │
│ │ LangGraph Agent │ │
│ │ - Context Compaction │ │
│ │ - Pydantic AI Routing │ │
│ │ - Tool Execution │ │
│ │ - Response Generation │ │
│ │ - Output Verification │ │
│ │ - Iterative Refinement │ │
│ └────────────────────────────┘ │
└──────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────┐
│ Observability Stack │
│ ┌──────────┐ ┌──────────────┐ │
│ │ Traces │ │ Metrics │ │
│ │ (Jaeger) │ │ (Prometheus) │ │
│ └─────┬────┘ └──────┬───────┘ │
│ └────────────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Grafana │ │
│ └──────────────┘ │
└──────────────────────────────────────┘
Our agent implements Anthropic's full gather-action-verify-repeat cycle with advanced enhancements:
┌─────────────────────────────────────────────────┐
│ LangGraph Agent Workflow │
│ │
│ START │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ 0. Load Context │ Just-in-Time │
│ │ (Dynamic) │ Semantic Search │
│ └──────────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ 1. Gather Context │ Compaction when │
│ │ (Compact) │ approaching limits │
│ └──────────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ 2. Take Action │ Route & Execute │
│ │ (Route/Tools) │ (Parallel if enabled) │
│ └──────────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ (Respond) │ Generate Response │
│ └──────────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ 3. Verify Work │ LLM-as-Judge │
│ │ (Verify) │ Quality Check │
│ └──────────┬──────────┘ │
│ │ │
│ ┌────┴────┐ │
│ │ │ │
│ Passed Failed │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────────────────┐ │
│ │ │ 4. Repeat │ │
│ │ │ (Refine) │ Max 3× │
│ │ └──────────┬──────────┘ │
│ │ │ │
│ │ └─────►(Respond) │
│ │ │
│ ▼ │
│ END │
│ │
└─────────────────────────────────────────────────┘
Key Features:
- Just-in-Time Context Loading: Dynamic semantic search (60% token reduction)
- Context Compaction: Prevents overflow on long conversations (40-60% token reduction)
- Parallel Tool Execution: Concurrent execution with dependency resolution (1.5-2.5x speedup)
- Enhanced Note-Taking: LLM-based 6-category extraction for long-term context
- Output Verification: LLM-as-judge pattern catches errors before users see them (23% quality improvement)
- Iterative Refinement: Up to 3 self-correction attempts for quality
- Observable: Full tracing of each loop component
See ADR-0024: Agentic Loop Implementation and ADR-0025: Advanced Enhancements for details.
Get the complete stack running in 2 minutes:
# Quick start script handles everything
./scripts/docker-compose-quickstart.sh
This starts:
- Agent API: http://localhost:8000 (MCP agent)
- OpenFGA: http://localhost:8080 (authorization)
- OpenFGA Playground: http://localhost:3001
- Jaeger UI: http://localhost:16686 (distributed tracing)
- Prometheus: http://localhost:9090 (metrics)
- Grafana: http://localhost:3000 (visualization, admin/admin)
- PostgreSQL: localhost:5432 (OpenFGA storage)
Then setup OpenFGA:
python scripts/setup/setup_openfga.py
# Add OPENFGA_STORE_ID and OPENFGA_MODEL_ID to .env
docker-compose restart agent
Test the agent:
curl http://localhost:8000/health
See Docker Compose documentation for details.
- Install dependencies:
uv sync # Install all dependencies and create virtual environment
# Note: Creates .venv automatically with all dependencies from pyproject.toml
- Start infrastructure (without agent):
# Start only supporting services
docker-compose up -d openfga postgres otel-collector jaeger prometheus grafana
- Configure environment:
cp .env.example .env
# Edit .env with your API keys:
# - GOOGLE_API_KEY (get from https://aistudio.google.com/apikey)
# - ANTHROPIC_API_KEY or OPENAI_API_KEY (optional)
- Setup OpenFGA:
python scripts/setup/setup_openfga.py
# Save OPENFGA_STORE_ID and OPENFGA_MODEL_ID to .env
- Run the agent locally:
python -m mcp_server_langgraph.mcp.server_streamable
- Test:
# Test with example client
python examples/client_stdio.py
# Or curl
curl http://localhost:8000/health
python -m mcp_server_langgraph.mcp.server_stdio
python examples/client_stdio.py
Add to your MCP client config (e.g., Claude Desktop):
{
"mcpServers": {
"langgraph-agent": {
"command": "python",
"args": ["/path/to/mcp_server_langgraph/src/mcp_server_langgraph/mcp/server_stdio.py"]
}
}
}
All tool calls now require JWT token authentication for security:
import httpx
# 1. Login to get JWT token
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/auth/login",
json={"username": "alice", "password": "alice123"}
)
data = response.json()
token = data["access_token"]
print(f"Token expires in {data['expires_in']}s")
# 2. Use token in all tool calls
response = await client.post(
"http://localhost:8000/message",
json={
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "agent_chat",
"arguments": {
"message": "Hello!",
"token": token, # ✅ Required
"user_id": "user:alice"
}
}
}
)
See: Authentication Migration Guide for complete details
The system supports multiple authentication backends via the auth factory:
# Development: In-memory user provider (with password validation)
# Set in .env:
AUTH_PROVIDER=inmemory
# Production: Keycloak SSO with OIDC/OAuth2
# Set in .env:
AUTH_PROVIDER=keycloak
KEYCLOAK_SERVER_URL=https://auth.example.com
KEYCLOAK_REALM=production
KEYCLOAK_CLIENT_ID=mcp-server
KEYCLOAK_CLIENT_SECRET=<secret>
Provider Features:
- InMemoryUserProvider: Fast, password-protected, for development/testing
- KeycloakUserProvider: Enterprise SSO, OIDC, automatic role sync to OpenFGA
- Custom Providers: Extend
UserProvider
interface for custom auth systems
Uses relationship-based access control (Google Zanzibar model):
from mcp_server_langgraph.auth.openfga import OpenFGAClient
client = OpenFGAClient(
api_url=settings.openfga_api_url,
store_id=settings.openfga_store_id,
model_id=settings.openfga_model_id
)
# Check permission
allowed = await client.check_permission(
user="user:alice",
relation="executor",
object="tool:agent_chat"
)
# Grant permission
await client.write_tuples([
{"user": "user:alice", "relation": "executor", "object": "tool:agent_chat"}
])
# List accessible resources
resources = await client.list_objects(
user="user:alice",
relation="executor",
object_type="tool"
)
Username | Password | Roles | Description |
---|---|---|---|
alice |
alice123 |
user , premium |
Premium user, member and admin of organization:acme |
bob |
bob123 |
user |
Standard user, member of organization:acme |
admin |
admin123 |
admin |
Admin user with elevated privileges |
For Production:
- Use
AUTH_PROVIDER=keycloak
with proper SSO - Or implement password hashing in
InMemoryUserProvider
- Never use default credentials in production
See:
This project uses a comprehensive, multi-layered testing approach to ensure production quality:
make test-coverage-combined
- 60-65% combined coverage (unit + integration tests)
- Most accurate coverage metric reflecting all test types
- Includes MCP server entry points tested via integration tests
- Generates combined HTML report:
htmlcov-combined/index.html
make test-unit
# OR: pytest -m unit -v
- ~400 tests with comprehensive assertions
- Mock all external dependencies (LLM, OpenFGA, Infisical)
- Test pure logic, validation, and error handling
make test-integration
# OR: pytest -m integration -v
- ~200 tests in isolated Docker environment
- Real OpenFGA authorization checks
- Real observability stack (Jaeger, Prometheus)
- End-to-end workflows with actual dependencies
- Coverage collection enabled (merged with unit tests in CI)
make test-property
# OR: pytest -m property -v
- 27+ Hypothesis tests generating thousands of test cases
- Automatic edge case discovery (empty strings, extreme values, malformed input)
- Tests properties like "JWT encode/decode should be reversible"
- See:
tests/property/test_llm_properties.py
,tests/property/test_auth_properties.py
make test-contract
# OR: pytest -m contract -v
- 20+ JSON Schema tests validating MCP protocol compliance
- Ensures JSON-RPC 2.0 format correctness
- Validates request/response schemas match specification
- See:
tests/contract/test_mcp_contract.py
,tests/contract/mcp_schemas.json
make test-regression
# OR: pytest -m regression -v
- Tracks latency metrics against baselines
- Alerts on >20% performance regressions
- Monitors: agent_response (p95 < 5s), llm_call (p95 < 10s), authorization (p95 < 50ms)
- See:
tests/regression/test_performance_regression.py
,tests/regression/baseline_metrics.json
make test-mutation
# OR: mutmut run && mutmut results
- Measures test quality by introducing code mutations
- Target: 80%+ mutation score on critical modules
- Identifies weak assertions and missing test cases
- See: Mutation Testing Guide
make validate-openapi
# OR: python scripts/validate_openapi.py
- Generates OpenAPI schema from code
- Validates schema correctness
- Detects breaking changes
- Ensures all endpoints documented
# Quick: Run all unit tests (2-5 seconds)
make test-unit
# All automated tests (unit + integration)
make test
# All quality tests (property + contract + regression)
make test-all-quality
# Coverage report
make test-coverage
# Opens htmlcov/index.html with detailed coverage
# Full test suite (including mutation tests - SLOW!)
make test-unit && make test-all-quality && make test-mutation
- Code Coverage: 80% (target: 90%)
- Property Tests: 27+ test classes with thousands of generated cases
- Contract Tests: 20+ protocol compliance tests
- Mutation Score: 80%+ target on critical modules (src/mcp_server_langgraph/core/agent.py, src/mcp_server_langgraph/auth/middleware.py, src/mcp_server_langgraph/core/config.py)
- Type Coverage: Strict mypy on 3 modules (config, feature_flags, observability)
- Performance: All p95 latencies within target thresholds
GitHub Actions runs quality tests on every PR:
# .github/workflows/quality-tests.yaml
jobs:
- property-tests # 15min timeout
- contract-tests # MCP protocol validation
- regression-tests # Performance monitoring
- openapi-validation # API schema validation
- mutation-tests # Weekly schedule (too slow for every PR)
See: .github/workflows/quality-tests.yaml
Control features dynamically without code changes:
# Enable/disable features via environment variables
FF_ENABLE_PYDANTIC_AI_ROUTING=true # Type-safe routing (default: true)
FF_ENABLE_LLM_FALLBACK=true # Multi-model fallback (default: true)
FF_ENABLE_OPENFGA=true # Authorization (default: true)
FF_OPENFGA_STRICT_MODE=false # Fail-closed vs fail-open (default: false)
FF_PYDANTIC_AI_CONFIDENCE_THRESHOLD=0.7 # Routing confidence (default: 0.7)
# All flags with FF_ prefix (20+ available)
Key Flags:
enable_pydantic_ai_routing
: Type-safe routing with confidence scoresenable_llm_fallback
: Automatic fallback to alternative modelsenable_openfga
: Fine-grained authorization (disable for development)openfga_strict_mode
: Fail-closed (deny on error) vs fail-open (allow on error)enable_experimental_*
: Master switches for experimental features
See: src/mcp_server_langgraph/core/feature_flags.py
for all flags and validation
This project supports dual observability: OpenTelemetry for infrastructure metrics and LangSmith for LLM-specific tracing.
LangSmith provides comprehensive LLM and agent observability:
Setup:
# Add to .env
LANGSMITH_API_KEY=your-key-from-smith.langchain.com
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=mcp-server-langgraph
Features:
- 🔍 Automatic Tracing: All LLM calls and agent steps traced
- 🎯 Prompt Engineering: Iterate on prompts with production data
- 📊 Evaluations: Compare model performance on datasets
- 💬 User Feedback: Collect and analyze user ratings
- 💰 Cost Tracking: Monitor LLM API costs per user/session
- 🐛 Debugging: Root cause analysis with full context
View traces: https://smith.langchain.com/
See LangSmith Integration Guide for complete setup guide.
Every request is traced end-to-end with OpenTelemetry:
from mcp_server_langgraph.observability.telemetry import tracer
with tracer.start_as_current_span("my_operation") as span:
span.set_attribute("custom.attribute", "value")
# Your code here
View traces in Jaeger: http://localhost:16686
Standard metrics are automatically collected:
agent.tool.calls
: Tool invocation counteragent.calls.successful
: Successful operation counteragent.calls.failed
: Failed operation counterauth.failures
: Authentication failure counterauthz.failures
: Authorization failure counteragent.response.duration
: Response time histogram
View metrics in Prometheus: http://localhost:9090
Structured logging with trace context:
from mcp_server_langgraph.observability.telemetry import logger
logger.info("Event occurred", extra={
"user_id": "user_123",
"custom_field": "value"
})
Logs include trace_id and span_id for correlation with traces.
The agent uses the functional API with:
- State Management: TypedDict-based state with message history
- Conditional Routing: Dynamic routing based on message content
- Tool Integration: Extensible tool system (extend in
src/mcp_server_langgraph/core/agent.py
) - Checkpointing: Conversation persistence with MemorySaver
Add tools in src/mcp_server_langgraph/core/agent.py
:
def custom_tool(state: AgentState) -> AgentState:
# Your tool logic
return state
workflow.add_node("custom_tool", custom_tool)
workflow.add_edge("router", "custom_tool")
All settings via environment variables, Infisical, or .env
file:
Variable | Description | Default |
---|---|---|
SERVICE_NAME |
Service identifier | mcp-server-langgraph |
OTLP_ENDPOINT |
OpenTelemetry collector | http://localhost:4317 |
JWT_SECRET_KEY |
Secret for JWT signing | (loaded from Infisical) |
ANTHROPIC_API_KEY |
Anthropic API key | (loaded from Infisical) |
MODEL_NAME |
Claude model to use | claude-3-5-sonnet-20241022 |
LOG_LEVEL |
Logging level | INFO |
OPENFGA_API_URL |
OpenFGA server URL | http://localhost:8080 |
OPENFGA_STORE_ID |
OpenFGA store ID | (from setup) |
OPENFGA_MODEL_ID |
OpenFGA model ID | (from setup) |
INFISICAL_CLIENT_ID |
Infisical auth client ID | (optional) |
INFISICAL_CLIENT_SECRET |
Infisical auth secret | (optional) |
INFISICAL_PROJECT_ID |
Infisical project ID | (optional) |
Variable | Description | Default |
---|---|---|
Dynamic Context Loading | ||
ENABLE_DYNAMIC_CONTEXT_LOADING |
Enable just-in-time context loading | false |
QDRANT_URL |
Qdrant server URL | localhost |
QDRANT_PORT |
Qdrant server port | 6333 |
QDRANT_COLLECTION_NAME |
Collection name for contexts | mcp_context |
DYNAMIC_CONTEXT_MAX_TOKENS |
Max tokens per context load | 2000 |
DYNAMIC_CONTEXT_TOP_K |
Number of contexts to retrieve | 3 |
EMBEDDING_MODEL |
SentenceTransformer model | all-MiniLM-L6-v2 |
CONTEXT_CACHE_SIZE |
LRU cache size | 100 |
Parallel Execution | ||
ENABLE_PARALLEL_EXECUTION |
Enable parallel tool execution | false |
MAX_PARALLEL_TOOLS |
Max concurrent tool executions | 5 |
Enhanced Note-Taking | ||
ENABLE_LLM_EXTRACTION |
Enable LLM-based extraction | false |
Context Management | ||
ENABLE_CONTEXT_COMPACTION |
Enable context compaction | true |
COMPACTION_THRESHOLD |
Token count triggering compaction | 8000 |
TARGET_AFTER_COMPACTION |
Target tokens after compaction | 4000 |
RECENT_MESSAGE_COUNT |
Messages to keep uncompacted | 5 |
Verification | ||
ENABLE_VERIFICATION |
Enable response verification | true |
VERIFICATION_QUALITY_THRESHOLD |
Quality score threshold | 0.7 |
MAX_REFINEMENT_ATTEMPTS |
Max refinement iterations | 3 |
See src/mcp_server_langgraph/core/config.py
for all options and .env.example
for complete examples.
- Infisical (if configured)
- Environment variables (fallback)
- Default values (last resort)
Access Grafana at http://localhost:3000 (admin/admin) and create dashboards using:
- Prometheus datasource: Metrics visualization
- Jaeger datasource: Trace exploration
Example queries:
- Request rate:
rate(agent_tool_calls_total[5m])
- Error rate:
rate(agent_calls_failed_total[5m])
- P95 latency:
histogram_quantile(0.95, agent_response_duration_bucket)
🔒 Production Checklist:
- Store JWT secret in Infisical
- Use production Infisical project with proper access controls
- Configure OpenFGA with PostgreSQL backend (not in-memory)
- Enable OpenFGA audit logging
- Enable TLS for all services (OTLP, OpenFGA, PostgreSQL)
- Implement rate limiting on MCP endpoints
- Use production-grade user database
- Review and minimize OpenFGA permissions
- Set up secret rotation in Infisical
- Enable monitoring alerts for auth failures
- Implement token rotation and revocation
- Use separate OpenFGA stores per environment
- Enable MFA for Infisical access
Deploy to LangGraph Platform for fully managed, serverless hosting:
# Login (uvx runs langgraph-cli without installing it)
uvx langgraph-cli login
# Deploy
uvx langgraph-cli deploy
Benefits:
- ✅ Zero infrastructure management
- ✅ Integrated LangSmith observability
- ✅ Automatic versioning and rollbacks
- ✅ Built-in scaling and load balancing
- ✅ One-command deployment
See LangGraph Platform Guide for complete platform deployment guide.
Deploy to Google Cloud Run for fully managed, serverless deployment:
# Quick deploy
cd cloudrun
./deploy.sh --setup
# Or use gcloud directly
gcloud run deploy mcp-server-langgraph \
--source . \
--region us-central1 \
--allow-unauthenticated
Benefits:
- ✅ Serverless autoscaling (0 to 100+ instances)
- ✅ Pay only for actual usage
- ✅ Automatic HTTPS and SSL certificates
- ✅ Integrated with Google Secret Manager
- ✅ Built-in monitoring and logging
See Cloud Run Deployment Guide for complete Cloud Run deployment guide.
The agent is fully containerized and ready for Kubernetes deployment. Supported platforms:
- Google Kubernetes Engine (GKE)
- Amazon Elastic Kubernetes Service (EKS)
- Azure Kubernetes Service (AKS)
- Rancher
- VMware Tanzu
Quick Deploy:
# Build and push image
docker build -t your-registry/langgraph-agent:v1.0.0 .
docker push your-registry/langgraph-agent:v1.0.0
# Deploy with Helm
helm install langgraph-agent ./deployments/helm/langgraph-agent \
--namespace langgraph-agent \
--create-namespace \
--set image.repository=your-registry/langgraph-agent \
--set image.tag=v1.0.0
# Or deploy with Kustomize
kubectl apply -k deployments/kustomize/overlays/production
See Kubernetes Deployment Guide for complete deployment guide.
Kong API Gateway integration provides:
- Rate Limiting: Tiered limits (60-1000 req/min) per consumer/tier
- Authentication: JWT, API Key, OAuth2
- Traffic Control: Request transformation, routing, load balancing
- Security: IP restriction, bot detection, CORS
- Monitoring: Prometheus metrics, request logging
# Deploy with Kong rate limiting
helm install langgraph-agent ./deployments/helm/langgraph-agent \
--set kong.enabled=true \
--set kong.rateLimitTier=premium
# Or apply Kong manifests directly
kubectl apply -k deployments/kubernetes/kong/
See Kong Gateway Integration for complete Kong setup and rate limiting configuration.
The agent supports multiple MCP transports:
- StreamableHTTP (Recommended): Modern HTTP streaming for production
- stdio: For Claude Desktop and local applications
# StreamableHTTP (recommended for web/production)
python -m mcp_server_langgraph.mcp.server_streamable
# stdio (local/desktop)
python -m mcp_server_langgraph.mcp.server_stdio
# Access StreamableHTTP endpoints
POST /message # Main MCP endpoint (streaming or regular)
GET /tools # List tools
GET /resources # List resources
Why StreamableHTTP?
- ✅ Modern HTTP/2+ streaming
- ✅ Better load balancer/proxy compatibility
- ✅ Proper request/response pairs
- ✅ Full MCP spec compliance
- ✅ Works with Kong rate limiting
Registry compliant - Includes manifest files for MCP Registry publication.
See MCP Registry Guide for registry deployment and transport configuration.
This project maintains high code quality through:
Assessed across 7 dimensions:
- ✅ Code Organization: 9/10 - Clear module structure, separation of concerns
- ✅ Testing: 10/10 - Multi-layered testing (unit, integration, property, contract, regression, mutation)
- ✅ Type Safety: 9/10 - Gradual strict mypy rollout (3/11 modules strict, 8 remaining)
- ✅ Documentation: 10/10 - ADRs, guides, API docs, inline documentation
- ✅ Error Handling: 9/10 - Comprehensive error handling, fallback modes
- ✅ Observability: 10/10 - Dual observability (OpenTelemetry + LangSmith)
- ✅ Security: 9/10 - JWT auth, fine-grained authz, secrets management, security scanning
Pre-Commit:
- Code formatting (black, isort)
- Linting (flake8, mypy)
- Security scan (bandit)
CI/CD (GitHub Actions):
- Unit tests (Python 3.10, 3.11, 3.12)
- Integration tests
- Property-based tests
- Contract tests
- Performance regression tests
- OpenAPI validation
- Mutation tests (weekly)
Commands:
# Code quality checks
make format # Format code (black + isort)
make lint # Run linters (flake8 + mypy)
make security-check # Security scan (bandit)
# Test suite
make test-unit # Fast unit tests
make test-all-quality # Property + contract + regression
make test-coverage # Coverage report
- Branch Protection: All changes via Pull Requests
- Conventional Commits:
feat:
,fix:
,test:
,docs:
,refactor:
- Code Review: Required before merge
- Quality Gates: All tests must pass
- Documentation: ADRs for architectural decisions
See: .github/CLAUDE.md for complete development guide
In Progress:
- Expanding strict mypy to all modules (3/11 complete)
- Increasing mutation score to 80%+ on all critical modules
- Adding more property-based tests for edge case discovery
Recent Improvements (2025):
- Implemented Anthropic's agentic loop (ADR-0024) with context compaction and verification
- Adopted Anthropic's tool design best practices (ADR-0023)
- Added 27+ property-based tests (Hypothesis)
- Added 20+ contract tests (JSON Schema)
- Implemented performance regression tracking
- Set up mutation testing with mutmut
- Created 25 Architecture Decision Records
- Implemented feature flag system
Thanks to all the amazing people who have contributed to this project! 🙌
This project follows the all-contributors specification.
Want to be listed here? See CONTRIBUTING.md!
Need help? Check out our Support Guide for:
- 📚 Documentation links
- 💬 Where to ask questions
- 🐛 How to report bugs
- 🔒 Security reporting
MIT - see LICENSE file for details
Built with:
- LangGraph - Agent framework
- MCP - Model Context Protocol
- OpenFGA - Authorization
- LiteLLM - Multi-LLM support
- OpenTelemetry - Observability
Special thanks to the open source community!
We welcome contributions from the community! 🎉
-
Read the guides:
- CONTRIBUTING.md - Contribution guidelines
- Development Guide - Developer setup
-
Find something to work on:
-
Get help:
- 💻 Code: Features, bug fixes, performance improvements
- 📖 Documentation: Guides, tutorials, API docs
- 🧪 Testing: Unit tests, integration tests, test coverage
- 🔒 Security: Security improvements, audits
- 🌐 Translations: i18n support (future)
- 💡 Ideas: Feature requests, architecture discussions
All contributors will be recognized in our Contributors section!