MCP Server with LangGraph + OpenFGA & Infisical

A production-ready cookie-cutter template for building MCP servers with LangGraph's Functional API. Features comprehensive authentication (JWT), fine-grained authorization (OpenFGA), secrets management (Infisical), and OpenTelemetry-based observability.

🎯 Opinionated, production-grade foundation for your MCP server projects.

🚀 Use This Template

# Generate your own MCP server project
uvx cookiecutter gh:vishnu2kmohan/mcp_server_langgraph

# Answer a few questions and get a fully configured project!

See Cookiecutter Template Strategy for detailed information.

📖 Template vs Project Usage

Using This as a Template

For: Creating your own MCP server with custom tools and logic

How:

Generate project: uvx cookiecutter gh:vishnu2kmohan/mcp_server_langgraph
Customize tools in generated agent.py
Update authorization model in scripts/setup/setup_openfga.py
Deploy your custom server

What gets customized:

Project name, author, license
Which features to include (auth, observability, deployment configs)
LLM provider preferences
Tool implementations

See: Cookiecutter Template Strategy (ADR-0011)

Using This Project Directly

For: Learning, testing, or using the reference implementation

How:

Clone: git clone https://github.com/vishnu2kmohan/mcp-server-langgraph.git
Install: uv sync
Configure: Copy .env.example to .env and add API keys
Run: make run-streamable

What you get:

Fully working MCP server with example tools (agent_chat, conversation_search, conversation_get)
Complete observability stack
Production-ready deployment configs
Comprehensive test suite

See: Quick Start below

Features

⭐ Anthropic Best Practices (9.8/10 Adherence)

This project achieves reference-quality implementation of Anthropic's AI agent best practices:

🎯 Just-in-Time Context Loading: Dynamic semantic search with Qdrant vector database
- Load only relevant context when needed (60% token reduction)
- Progressive discovery through iterative search
- Token-aware batch loading with configurable budgets
⚡ Parallel Tool Execution: Concurrent execution with automatic dependency resolution
- 1.5-2.5x latency reduction for independent operations
- Topological sorting for correct execution order
- Graceful error handling and recovery
📝 Enhanced Structured Note-Taking: LLM-based 6-category information extraction
- Automatic categorization: decisions, requirements, facts, action_items, issues, preferences
- Context preservation across multi-turn conversations
- Fallback to rule-based extraction for reliability
✅ Complete Agentic Loop: Full gather-action-verify-repeat cycle
- Context compaction (40-60% token reduction)
- LLM-as-judge verification (23% quality improvement)
- Iterative refinement (up to 3 attempts)
- Observable with full tracing

See: Anthropic Best Practices Assessment | ADR-0023 | ADR-0024 | ADR-0025 | Examples

🎯 Core Capabilities

Multi-LLM Support (LiteLLM): 100+ LLM providers - Anthropic, OpenAI, Google, Azure, AWS Bedrock, Ollama
Open-Source Models: Llama 3.1, Qwen 2.5, Mistral, DeepSeek, and more via Ollama
LangGraph Functional API: Stateful agent with conditional routing and checkpointing
MCP Server: Standard protocol for exposing AI agents as tools (stdio, StreamableHTTP)
Enterprise Authentication: Pluggable auth providers (InMemory, Keycloak SSO)
- JWT Authentication: Token-based authentication with validation and expiration
- Keycloak Integration: Production-ready SSO with OIDC/OAuth2 (integrations/keycloak.md)
- Token Refresh: Automatic refresh token rotation
- JWKS Verification: Public key verification without shared secrets
Session Management: Flexible session storage backends
- InMemory: Fast in-memory sessions for development
- Redis: Persistent sessions with TTL, sliding windows, concurrent limits
- Advanced Features: Session lifecycle management, bulk revocation, user tracking
Fine-Grained Authorization: OpenFGA (Zanzibar-style) relationship-based access control
- Role Mapping: Declarative role mappings with YAML configuration
- Keycloak Sync: Automatic role/group synchronization to OpenFGA
- Hierarchies: Role inheritance and conditional mappings
Secrets Management: Infisical integration for secure secret storage and retrieval
Feature Flags: Gradual rollouts with environment-based configuration
Dual Observability: OpenTelemetry + LangSmith for comprehensive monitoring
- OpenTelemetry: Distributed tracing with Jaeger, metrics with Prometheus (30+ auth metrics)
- LangSmith: LLM-specific tracing, prompt engineering, evaluations
Structured Logging: JSON logging with trace context correlation
Full Observability Stack: Docker Compose setup with OpenFGA, Keycloak, Redis, Jaeger, Prometheus, Grafana, and Qdrant
LangGraph Platform: Deploy to managed LangGraph Cloud with one command
Automatic Fallback: Resilient multi-model fallback for high availability

📦 Optional Dependencies

The project supports optional feature sets that can be installed on demand:

Secrets Management ([secrets]): Infisical integration for centralized secrets
- Install: pip install -e ".[secrets]" or uv sync --extra secrets
- Fallback: Environment variables (.env file)
- Production: Recommended for secure secret rotation
- See: Infisical Installation Guide
Self-Hosted Embeddings ([embeddings]): sentence-transformers for local embedding generation
- Install: pip install -e ".[embeddings]" or uv sync --extra embeddings
- Fallback: Google Gemini API (langchain-google-genai, installed by default)
- Production: Use API-based embeddings (lower latency, no GPU required)
- Note: Self-hosted embeddings require significant resources
GDPR Storage Backend: PostgreSQL or Redis for compliance data persistence
- CRITICAL: In-memory storage is NOT production-ready
- Required for: GDPR compliance endpoints (/api/v1/users/me/*)
- Config: Set GDPR_STORAGE_BACKEND=postgres or redis in production
- See: GDPR Storage Configuration
All Features ([all]): Install all optional dependencies
- Install: pip install -e ".[all]" or uv sync --all-extras
- Use for: Development, testing, full feature evaluation

Development vs Production:

Development: All features work with fallbacks (in-memory, env vars, API-based)
Production: Use persistent backends (Redis, PostgreSQL) and proper secret management

🧪 Quality & Testing

Property-Based Testing: 27+ Hypothesis tests discovering edge cases automatically
Contract Testing: 20+ JSON Schema tests ensuring MCP protocol compliance
Performance Regression Testing: Automated latency tracking against baselines
Mutation Testing: Test effectiveness verification with mutmut (80%+ target)
Strict Typing: Gradual mypy strict mode rollout (3 modules complete)
OpenAPI Validation: Automated schema generation and breaking change detection
80% Code Coverage: Comprehensive unit and integration tests

🚀 Production Deployment

Kubernetes Ready: Production manifests for GKE, EKS, AKS, Rancher, VMware Tanzu
Helm Charts: Flexible deployment with customizable values and dependencies
Kustomize: Environment-specific overlays (dev/staging/production)
Multi-Platform: Docker Compose, kubectl, Kustomize, Helm deployment options
CI/CD Pipeline: Automated testing, validation, build, and deployment with GitHub Actions
Deployment Validation: Comprehensive validation scripts for all deployment configurations
E2E Testing: Automated deployment tests with kind clusters
High Availability: Pod anti-affinity, HPA, PDB, rolling updates
Monitoring: 25+ Prometheus alerts, 4 Grafana dashboards, 9 operational runbooks
Observability: Full monitoring for Keycloak, Redis, sessions, and application
Secrets: External secrets operator support, sealed secrets compatible
Service Mesh: Compatible with Istio, Linkerd, and other service meshes

📚 Documentation & Architecture

Architecture Decision Records (ADRs): 25 documented design decisions (adr/)
Comprehensive Documentation: Complete documentation index with guides, tutorials, and references
API Documentation: Interactive OpenAPI/Swagger UI

📚 Documentation

Documentation Index - Complete guide to all documentation
API Documentation - Interactive OpenAPI/Swagger UI (when running locally)
Mintlify Deployment - Mintlify documentation deployment instructions

📖 Quality & Testing Guides

Mutation Testing Guide - Test effectiveness measurement and improvement
Strict Typing Guide - Gradual mypy strict mode rollout
Architecture Decision Records - Documented architectural choices

🚀 Deployment & Operations

Deployment Quickstart - Quick deployment guide for all platforms
Deployment README - Comprehensive deployment documentation
CI/CD Guide - Continuous integration and deployment pipeline
Keycloak Integration - Enterprise SSO setup and configuration

📝 Architecture Decision Records (ADRs)

💡 Examples & Tutorials

Examples Directory - Comprehensive examples demonstrating all features
- Dynamic Context Loading - Just-in-Time semantic search
- Parallel Tool Execution - Concurrent execution patterns
- Enhanced Note-Taking - LLM-based information extraction
- Complete Workflow - Full agentic loop demonstration

Requirements

System Requirements

Python: 3.10, 3.11, or 3.12
Memory: 2GB RAM minimum (4GB recommended for production)
Disk: 500MB for dependencies + 1GB for optional vector databases
OS: Linux, macOS, or Windows with WSL2

Required Services (for full features)

Redis: Session storage (or use in-memory mode)
PostgreSQL: Compliance data storage (optional)
OpenFGA: Fine-grained authorization (optional)

Optional Components

Qdrant/Weaviate: Vector database for semantic search
Jaeger: Distributed tracing visualization
Prometheus + Grafana: Metrics and monitoring

See Production Checklist for detailed requirements.

Installation

Quick Install

Using uv (recommended):

This project uses uv for fast, reliable dependency management:

# Install from PyPI
uv pip install mcp-server-langgraph

# Or clone and develop locally (creates virtual environment automatically)
git clone https://github.com/vishnu2kmohan/mcp-server-langgraph.git
cd mcp-server-langgraph
uv sync  # Installs all dependencies from pyproject.toml and uv.lock

Why uv?

⚡ 10-100x faster than pip
🔒 Reproducible builds via uv.lock lockfile
📦 Single source of truth in pyproject.toml
🛡️ Better dependency resolution

Alternative: Using pip:

# Install from PyPI
pip install mcp-server-langgraph

# Or install from source
git clone https://github.com/vishnu2kmohan/mcp-server-langgraph.git
cd mcp-server-langgraph
pip install -e .

Note: requirements*.txt files are deprecated. Use uv sync instead.

Verify Installation

python -c "import mcp_server_langgraph; print(mcp_server_langgraph.__version__)"

See Installation Guide for complete instructions, including:

Docker installation
Virtual environment setup
Dependency management
Configuration options

Architecture

System Architecture

┌──────────────────────┐
│    MCP Client        │
│  (Claude Desktop     │
│   or other)          │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────────────────────┐
│         MCP Server                   │
│  (server_stdio.py/streamable.py)    │
│  ┌────────────────────────────┐     │
│  │   Auth Middleware          │     │
│  │   - JWT Verification       │     │
│  │   - OpenFGA Authorization  │     │
│  └────────────────────────────┘     │
│  ┌────────────────────────────┐     │
│  │   LangGraph Agent          │     │
│  │   - Context Compaction     │     │
│  │   - Pydantic AI Routing    │     │
│  │   - Tool Execution         │     │
│  │   - Response Generation    │     │
│  │   - Output Verification    │     │
│  │   - Iterative Refinement   │     │
│  └────────────────────────────┘     │
└──────────┬───────────────────────────┘
           │
           ▼
┌──────────────────────────────────────┐
│    Observability Stack               │
│  ┌──────────┐    ┌──────────────┐   │
│  │ Traces   │    │   Metrics    │   │
│  │ (Jaeger) │    │ (Prometheus) │   │
│  └─────┬────┘    └──────┬───────┘   │
│        └────────────────┘            │
│                ▼                     │
│        ┌──────────────┐              │
│        │   Grafana    │              │
│        └──────────────┘              │
└──────────────────────────────────────┘

Agentic Loop (ADR-0024, ADR-0025)

Our agent implements Anthropic's full gather-action-verify-repeat cycle with advanced enhancements:

┌─────────────────────────────────────────────────┐
│         LangGraph Agent Workflow                │
│                                                 │
│  START                                          │
│    │                                            │
│    ▼                                            │
│  ┌─────────────────────┐                       │
│  │ 0. Load Context     │ Just-in-Time          │
│  │    (Dynamic)        │ Semantic Search       │
│  └──────────┬──────────┘                       │
│             │                                    │
│             ▼                                    │
│  ┌─────────────────────┐                       │
│  │ 1. Gather Context   │ Compaction when       │
│  │    (Compact)        │ approaching limits    │
│  └──────────┬──────────┘                       │
│             │                                    │
│             ▼                                    │
│  ┌─────────────────────┐                       │
│  │ 2. Take Action      │ Route & Execute       │
│  │    (Route/Tools)    │ (Parallel if enabled) │
│  └──────────┬──────────┘                       │
│             │                                    │
│             ▼                                    │
│  ┌─────────────────────┐                       │
│  │    (Respond)        │ Generate Response     │
│  └──────────┬──────────┘                       │
│             │                                    │
│             ▼                                    │
│  ┌─────────────────────┐                       │
│  │ 3. Verify Work      │ LLM-as-Judge          │
│  │    (Verify)         │ Quality Check         │
│  └──────────┬──────────┘                       │
│             │                                    │
│        ┌────┴────┐                              │
│        │         │                              │
│     Passed    Failed                            │
│        │         │                              │
│        │         ▼                              │
│        │    ┌─────────────────────┐            │
│        │    │ 4. Repeat           │            │
│        │    │    (Refine)         │ Max 3×     │
│        │    └──────────┬──────────┘            │
│        │               │                        │
│        │               └─────►(Respond)        │
│        │                                        │
│        ▼                                        │
│      END                                        │
│                                                 │
└─────────────────────────────────────────────────┘

Key Features:

Just-in-Time Context Loading: Dynamic semantic search (60% token reduction)
Context Compaction: Prevents overflow on long conversations (40-60% token reduction)
Parallel Tool Execution: Concurrent execution with dependency resolution (1.5-2.5x speedup)
Enhanced Note-Taking: LLM-based 6-category extraction for long-term context
Output Verification: LLM-as-judge pattern catches errors before users see them (23% quality improvement)
Iterative Refinement: Up to 3 self-correction attempts for quality
Observable: Full tracing of each loop component

See ADR-0024: Agentic Loop Implementation and ADR-0025: Advanced Enhancements for details.

Quick Start

🐳 Docker Compose (Recommended)

Get the complete stack running in 2 minutes:

# Quick start script handles everything
./scripts/docker-compose-quickstart.sh

This starts:

Agent API: http://localhost:8000 (MCP agent)
OpenFGA: http://localhost:8080 (authorization)
OpenFGA Playground: http://localhost:3001
Jaeger UI: http://localhost:16686 (distributed tracing)
Prometheus: http://localhost:9090 (metrics)
Grafana: http://localhost:3000 (visualization, admin/admin)
PostgreSQL: localhost:5432 (OpenFGA storage)

Then setup OpenFGA:

python scripts/setup/setup_openfga.py
# Add OPENFGA_STORE_ID and OPENFGA_MODEL_ID to .env
docker-compose restart agent

Test the agent:

curl http://localhost:8000/health

See Docker Compose documentation for details.

🐍 Local Python Development

Install dependencies:

uv sync  # Install all dependencies and create virtual environment
# Note: Creates .venv automatically with all dependencies from pyproject.toml

Start infrastructure (without agent):

# Start only supporting services
docker-compose up -d openfga postgres otel-collector jaeger prometheus grafana

Configure environment:

cp .env.example .env
# Edit .env with your API keys:
# - GOOGLE_API_KEY (get from https://aistudio.google.com/apikey)
# - ANTHROPIC_API_KEY or OPENAI_API_KEY (optional)

Setup OpenFGA:

python scripts/setup/setup_openfga.py
# Save OPENFGA_STORE_ID and OPENFGA_MODEL_ID to .env

Run the agent locally:

python -m mcp_server_langgraph.mcp.server_streamable

Test:

# Test with example client
python examples/client_stdio.py

# Or curl
curl http://localhost:8000/health

Usage

Running the MCP Server

python -m mcp_server_langgraph.mcp.server_stdio

Testing with Example Client

python examples/client_stdio.py

MCP Client Configuration

Add to your MCP client config (e.g., Claude Desktop):

{
  "mcpServers": {
    "langgraph-agent": {
      "command": "python",
      "args": ["/path/to/mcp_server_langgraph/src/mcp_server_langgraph/mcp/server_stdio.py"]
    }
  }
}

Authentication & Authorization

Token-Based Authentication (v2.8.0)

All tool calls now require JWT token authentication for security:

import httpx

# 1. Login to get JWT token
async with httpx.AsyncClient() as client:
    response = await client.post(
        "http://localhost:8000/auth/login",
        json={"username": "alice", "password": "alice123"}
    )
    data = response.json()
    token = data["access_token"]
    print(f"Token expires in {data['expires_in']}s")

# 2. Use token in all tool calls
response = await client.post(
    "http://localhost:8000/message",
    json={
        "jsonrpc": "2.0",
        "method": "tools/call",
        "params": {
            "name": "agent_chat",
            "arguments": {
                "message": "Hello!",
                "token": token,  # ✅ Required
                "user_id": "user:alice"
            }
        }
    }
)

See: Authentication Migration Guide for complete details

Configurable Authentication Providers

The system supports multiple authentication backends via the auth factory:

# Development: In-memory user provider (with password validation)
# Set in .env:
AUTH_PROVIDER=inmemory

# Production: Keycloak SSO with OIDC/OAuth2
# Set in .env:
AUTH_PROVIDER=keycloak
KEYCLOAK_SERVER_URL=https://auth.example.com
KEYCLOAK_REALM=production
KEYCLOAK_CLIENT_ID=mcp-server
KEYCLOAK_CLIENT_SECRET=<secret>

Provider Features:

InMemoryUserProvider: Fast, password-protected, for development/testing
KeycloakUserProvider: Enterprise SSO, OIDC, automatic role sync to OpenFGA
Custom Providers: Extend UserProvider interface for custom auth systems

OpenFGA Fine-Grained Authorization

Uses relationship-based access control (Google Zanzibar model):

from mcp_server_langgraph.auth.openfga import OpenFGAClient

client = OpenFGAClient(
    api_url=settings.openfga_api_url,
    store_id=settings.openfga_store_id,
    model_id=settings.openfga_model_id
)

# Check permission
allowed = await client.check_permission(
    user="user:alice",
    relation="executor",
    object="tool:agent_chat"
)

# Grant permission
await client.write_tuples([
    {"user": "user:alice", "relation": "executor", "object": "tool:agent_chat"}
])

# List accessible resources
resources = await client.list_objects(
    user="user:alice",
    relation="executor",
    object_type="tool"
)

Default Users (Development Only)

Username	Password	Roles	Description
`alice`	`alice123`	`user`, `premium`	Premium user, member and admin of organization:acme
`bob`	`bob123`	`user`	Standard user, member of organization:acme
`admin`	`admin123`	`admin`	Admin user with elevated privileges

⚠️ Security Warning: Default users use plaintext passwords for development only.

For Production:

Use AUTH_PROVIDER=keycloak with proper SSO
Or implement password hashing in InMemoryUserProvider
Never use default credentials in production

See:

Testing Strategy

This project uses a comprehensive, multi-layered testing approach to ensure production quality:

🧪 Test Types

Combined Coverage Testing (Recommended)

make test-coverage-combined

60-65% combined coverage (unit + integration tests)
Most accurate coverage metric reflecting all test types
Includes MCP server entry points tested via integration tests
Generates combined HTML report: htmlcov-combined/index.html

Unit Tests (Fast, No External Dependencies)

make test-unit
# OR: pytest -m unit -v

~400 tests with comprehensive assertions
Mock all external dependencies (LLM, OpenFGA, Infisical)
Test pure logic, validation, and error handling

Integration Tests (Require Infrastructure)

make test-integration
# OR: pytest -m integration -v

~200 tests in isolated Docker environment
Real OpenFGA authorization checks
Real observability stack (Jaeger, Prometheus)
End-to-end workflows with actual dependencies
Coverage collection enabled (merged with unit tests in CI)

Property-Based Tests (Edge Case Discovery)

make test-property
# OR: pytest -m property -v

27+ Hypothesis tests generating thousands of test cases
Automatic edge case discovery (empty strings, extreme values, malformed input)
Tests properties like "JWT encode/decode should be reversible"
See: tests/property/test_llm_properties.py, tests/property/test_auth_properties.py

Contract Tests (Protocol Compliance)

make test-contract
# OR: pytest -m contract -v

20+ JSON Schema tests validating MCP protocol compliance
Ensures JSON-RPC 2.0 format correctness
Validates request/response schemas match specification
See: tests/contract/test_mcp_contract.py, tests/contract/mcp_schemas.json

Performance Regression Tests

make test-regression
# OR: pytest -m regression -v

Tracks latency metrics against baselines
Alerts on >20% performance regressions
Monitors: agent_response (p95 < 5s), llm_call (p95 < 10s), authorization (p95 < 50ms)
See: tests/regression/test_performance_regression.py, tests/regression/baseline_metrics.json

Mutation Testing (Test Effectiveness)

make test-mutation
# OR: mutmut run && mutmut results

Measures test quality by introducing code mutations
Target: 80%+ mutation score on critical modules
Identifies weak assertions and missing test cases
See: Mutation Testing Guide

OpenAPI Validation

make validate-openapi
# OR: python scripts/validate_openapi.py

Generates OpenAPI schema from code
Validates schema correctness
Detects breaking changes
Ensures all endpoints documented

🎯 Running Tests

# Quick: Run all unit tests (2-5 seconds)
make test-unit

# All automated tests (unit + integration)
make test

# All quality tests (property + contract + regression)
make test-all-quality

# Coverage report
make test-coverage
# Opens htmlcov/index.html with detailed coverage

# Full test suite (including mutation tests - SLOW!)
make test-unit && make test-all-quality && make test-mutation

📊 Quality Metrics

Code Coverage: 80% (target: 90%)
Property Tests: 27+ test classes with thousands of generated cases
Contract Tests: 20+ protocol compliance tests
Mutation Score: 80%+ target on critical modules (src/mcp_server_langgraph/core/agent.py, src/mcp_server_langgraph/auth/middleware.py, src/mcp_server_langgraph/core/config.py)
Type Coverage: Strict mypy on 3 modules (config, feature_flags, observability)
Performance: All p95 latencies within target thresholds

🔄 CI/CD Integration

GitHub Actions runs quality tests on every PR:

# .github/workflows/quality-tests.yaml
jobs:
  - property-tests     # 15min timeout
  - contract-tests     # MCP protocol validation
  - regression-tests   # Performance monitoring
  - openapi-validation # API schema validation
  - mutation-tests     # Weekly schedule (too slow for every PR)

See: .github/workflows/quality-tests.yaml

Feature Flags

Control features dynamically without code changes:

# Enable/disable features via environment variables
FF_ENABLE_PYDANTIC_AI_ROUTING=true      # Type-safe routing (default: true)
FF_ENABLE_LLM_FALLBACK=true             # Multi-model fallback (default: true)
FF_ENABLE_OPENFGA=true                  # Authorization (default: true)
FF_OPENFGA_STRICT_MODE=false            # Fail-closed vs fail-open (default: false)
FF_PYDANTIC_AI_CONFIDENCE_THRESHOLD=0.7 # Routing confidence (default: 0.7)

# All flags with FF_ prefix (20+ available)

Key Flags:

enable_pydantic_ai_routing: Type-safe routing with confidence scores
enable_llm_fallback: Automatic fallback to alternative models
enable_openfga: Fine-grained authorization (disable for development)
openfga_strict_mode: Fail-closed (deny on error) vs fail-open (allow on error)
enable_experimental_*: Master switches for experimental features

See: src/mcp_server_langgraph/core/feature_flags.py for all flags and validation

Observability

This project supports dual observability: OpenTelemetry for infrastructure metrics and LangSmith for LLM-specific tracing.

LangSmith Tracing (LLM Observability)

LangSmith provides comprehensive LLM and agent observability:

Setup:

# Add to .env
LANGSMITH_API_KEY=your-key-from-smith.langchain.com
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=mcp-server-langgraph

Features:

🔍 Automatic Tracing: All LLM calls and agent steps traced
🎯 Prompt Engineering: Iterate on prompts with production data
📊 Evaluations: Compare model performance on datasets
💬 User Feedback: Collect and analyze user ratings
💰 Cost Tracking: Monitor LLM API costs per user/session
🐛 Debugging: Root cause analysis with full context

View traces: https://smith.langchain.com/

See LangSmith Integration Guide for complete setup guide.

OpenTelemetry Tracing (Infrastructure)

Every request is traced end-to-end with OpenTelemetry:

from mcp_server_langgraph.observability.telemetry import tracer

with tracer.start_as_current_span("my_operation") as span:
    span.set_attribute("custom.attribute", "value")
    # Your code here

View traces in Jaeger: http://localhost:16686

Metrics

Standard metrics are automatically collected:

agent.tool.calls: Tool invocation counter
agent.calls.successful: Successful operation counter
agent.calls.failed: Failed operation counter
auth.failures: Authentication failure counter
authz.failures: Authorization failure counter
agent.response.duration: Response time histogram

View metrics in Prometheus: http://localhost:9090

Logging

Structured logging with trace context:

from mcp_server_langgraph.observability.telemetry import logger

logger.info("Event occurred", extra={
    "user_id": "user_123",
    "custom_field": "value"
})

Logs include trace_id and span_id for correlation with traces.

LangGraph Agent

The agent uses the functional API with:

State Management: TypedDict-based state with message history
Conditional Routing: Dynamic routing based on message content
Tool Integration: Extensible tool system (extend in src/mcp_server_langgraph/core/agent.py)
Checkpointing: Conversation persistence with MemorySaver

Extending the Agent

Add tools in src/mcp_server_langgraph/core/agent.py:

def custom_tool(state: AgentState) -> AgentState:
    # Your tool logic
    return state

workflow.add_node("custom_tool", custom_tool)
workflow.add_edge("router", "custom_tool")

Configuration

All settings via environment variables, Infisical, or .env file:

Core Configuration

Variable	Description	Default
`SERVICE_NAME`	Service identifier	`mcp-server-langgraph`
`OTLP_ENDPOINT`	OpenTelemetry collector	`http://localhost:4317`
`JWT_SECRET_KEY`	Secret for JWT signing	(loaded from Infisical)
`ANTHROPIC_API_KEY`	Anthropic API key	(loaded from Infisical)
`MODEL_NAME`	Claude model to use	`claude-3-5-sonnet-20241022`
`LOG_LEVEL`	Logging level	`INFO`
`OPENFGA_API_URL`	OpenFGA server URL	`http://localhost:8080`
`OPENFGA_STORE_ID`	OpenFGA store ID	(from setup)
`OPENFGA_MODEL_ID`	OpenFGA model ID	(from setup)
`INFISICAL_CLIENT_ID`	Infisical auth client ID	(optional)
`INFISICAL_CLIENT_SECRET`	Infisical auth secret	(optional)
`INFISICAL_PROJECT_ID`	Infisical project ID	(optional)

Anthropic Best Practices Configuration

Variable	Description	Default
Dynamic Context Loading
`ENABLE_DYNAMIC_CONTEXT_LOADING`	Enable just-in-time context loading	`false`
`QDRANT_URL`	Qdrant server URL	`localhost`
`QDRANT_PORT`	Qdrant server port	`6333`
`QDRANT_COLLECTION_NAME`	Collection name for contexts	`mcp_context`
`DYNAMIC_CONTEXT_MAX_TOKENS`	Max tokens per context load	`2000`
`DYNAMIC_CONTEXT_TOP_K`	Number of contexts to retrieve	`3`
`EMBEDDING_MODEL`	SentenceTransformer model	`all-MiniLM-L6-v2`
`CONTEXT_CACHE_SIZE`	LRU cache size	`100`
Parallel Execution
`ENABLE_PARALLEL_EXECUTION`	Enable parallel tool execution	`false`
`MAX_PARALLEL_TOOLS`	Max concurrent tool executions	`5`
Enhanced Note-Taking
`ENABLE_LLM_EXTRACTION`	Enable LLM-based extraction	`false`
Context Management
`ENABLE_CONTEXT_COMPACTION`	Enable context compaction	`true`
`COMPACTION_THRESHOLD`	Token count triggering compaction	`8000`
`TARGET_AFTER_COMPACTION`	Target tokens after compaction	`4000`
`RECENT_MESSAGE_COUNT`	Messages to keep uncompacted	`5`
Verification
`ENABLE_VERIFICATION`	Enable response verification	`true`
`VERIFICATION_QUALITY_THRESHOLD`	Quality score threshold	`0.7`
`MAX_REFINEMENT_ATTEMPTS`	Max refinement iterations	`3`

See src/mcp_server_langgraph/core/config.py for all options and .env.example for complete examples.

Secrets Loading Priority

Infisical (if configured)
Environment variables (fallback)
Default values (last resort)

Monitoring Dashboard

Access Grafana at http://localhost:3000 (admin/admin) and create dashboards using:

Prometheus datasource: Metrics visualization
Jaeger datasource: Trace exploration

Example queries:

Request rate: rate(agent_tool_calls_total[5m])
Error rate: rate(agent_calls_failed_total[5m])
P95 latency: histogram_quantile(0.95, agent_response_duration_bucket)

Security Considerations

🔒 Production Checklist:

Deployment Options

LangGraph Platform (Managed Cloud)

Deploy to LangGraph Platform for fully managed, serverless hosting:

# Login (uvx runs langgraph-cli without installing it)
uvx langgraph-cli login

# Deploy
uvx langgraph-cli deploy

Benefits:

✅ Zero infrastructure management
✅ Integrated LangSmith observability
✅ Automatic versioning and rollbacks
✅ Built-in scaling and load balancing
✅ One-command deployment

See LangGraph Platform Guide for complete platform deployment guide.

Google Cloud Run (Serverless)

Deploy to Google Cloud Run for fully managed, serverless deployment:

# Quick deploy
cd cloudrun
./deploy.sh --setup

# Or use gcloud directly
gcloud run deploy mcp-server-langgraph \
  --source . \
  --region us-central1 \
  --allow-unauthenticated

Benefits:

✅ Serverless autoscaling (0 to 100+ instances)
✅ Pay only for actual usage
✅ Automatic HTTPS and SSL certificates
✅ Integrated with Google Secret Manager
✅ Built-in monitoring and logging

See Cloud Run Deployment Guide for complete Cloud Run deployment guide.

Kubernetes Deployment

The agent is fully containerized and ready for Kubernetes deployment. Supported platforms:

Google Kubernetes Engine (GKE)
Amazon Elastic Kubernetes Service (EKS)
Azure Kubernetes Service (AKS)
Rancher
VMware Tanzu

Quick Deploy:

# Build and push image
docker build -t your-registry/langgraph-agent:v1.0.0 .
docker push your-registry/langgraph-agent:v1.0.0

# Deploy with Helm
helm install langgraph-agent ./deployments/helm/langgraph-agent \
  --namespace langgraph-agent \
  --create-namespace \
  --set image.repository=your-registry/langgraph-agent \
  --set image.tag=v1.0.0

# Or deploy with Kustomize
kubectl apply -k deployments/kustomize/overlays/production

See Kubernetes Deployment Guide for complete deployment guide.

API Gateway & Rate Limiting

Kong API Gateway integration provides:

Rate Limiting: Tiered limits (60-1000 req/min) per consumer/tier
Authentication: JWT, API Key, OAuth2
Traffic Control: Request transformation, routing, load balancing
Security: IP restriction, bot detection, CORS
Monitoring: Prometheus metrics, request logging

# Deploy with Kong rate limiting
helm install langgraph-agent ./deployments/helm/langgraph-agent \
  --set kong.enabled=true \
  --set kong.rateLimitTier=premium

# Or apply Kong manifests directly
kubectl apply -k deployments/kubernetes/kong/

See Kong Gateway Integration for complete Kong setup and rate limiting configuration.

MCP Transports & Registry

The agent supports multiple MCP transports:

StreamableHTTP (Recommended): Modern HTTP streaming for production
stdio: For Claude Desktop and local applications

# StreamableHTTP (recommended for web/production)
python -m mcp_server_langgraph.mcp.server_streamable

# stdio (local/desktop)
python -m mcp_server_langgraph.mcp.server_stdio

# Access StreamableHTTP endpoints
POST /message         # Main MCP endpoint (streaming or regular)
GET /tools            # List tools
GET /resources        # List resources

Why StreamableHTTP?

✅ Modern HTTP/2+ streaming
✅ Better load balancer/proxy compatibility
✅ Proper request/response pairs
✅ Full MCP spec compliance
✅ Works with Kong rate limiting

Registry compliant - Includes manifest files for MCP Registry publication.

See MCP Registry Guide for registry deployment and transport configuration.

Quality Practices

This project maintains high code quality through:

📈 Current Quality Score: 9.6/10

Assessed across 7 dimensions:

✅ Code Organization: 9/10 - Clear module structure, separation of concerns
✅ Testing: 10/10 - Multi-layered testing (unit, integration, property, contract, regression, mutation)
✅ Type Safety: 9/10 - Gradual strict mypy rollout (3/11 modules strict, 8 remaining)
✅ Documentation: 10/10 - ADRs, guides, API docs, inline documentation
✅ Error Handling: 9/10 - Comprehensive error handling, fallback modes
✅ Observability: 10/10 - Dual observability (OpenTelemetry + LangSmith)
✅ Security: 9/10 - JWT auth, fine-grained authz, secrets management, security scanning

🎯 Quality Gates

Pre-Commit:

Code formatting (black, isort)
Linting (flake8, mypy)
Security scan (bandit)

CI/CD (GitHub Actions):

Unit tests (Python 3.10, 3.11, 3.12)
Integration tests
Property-based tests
Contract tests
Performance regression tests
OpenAPI validation
Mutation tests (weekly)

Commands:

# Code quality checks
make format           # Format code (black + isort)
make lint             # Run linters (flake8 + mypy)
make security-check   # Security scan (bandit)

# Test suite
make test-unit        # Fast unit tests
make test-all-quality # Property + contract + regression
make test-coverage    # Coverage report

📝 Development Workflow

Branch Protection: All changes via Pull Requests
Conventional Commits: feat:, fix:, test:, docs:, refactor:
Code Review: Required before merge
Quality Gates: All tests must pass
Documentation: ADRs for architectural decisions

See: .github/CLAUDE.md for complete development guide

🔄 Continuous Improvement

In Progress:

Expanding strict mypy to all modules (3/11 complete)
Increasing mutation score to 80%+ on all critical modules
Adding more property-based tests for edge case discovery

Recent Improvements (2025):

Implemented Anthropic's agentic loop (ADR-0024) with context compaction and verification
Adopted Anthropic's tool design best practices (ADR-0023)
Added 27+ property-based tests (Hypothesis)
Added 20+ contract tests (JSON Schema)
Implemented performance regression tracking
Set up mutation testing with mutmut
Created 25 Architecture Decision Records
Implemented feature flag system

Contributors

Thanks to all the amazing people who have contributed to this project! 🙌

This project follows the all-contributors specification.

Want to be listed here? See CONTRIBUTING.md!

Support

Need help? Check out our Support Guide for:

📚 Documentation links
💬 Where to ask questions
🐛 How to report bugs
🔒 Security reporting

License

MIT - see LICENSE file for details

Acknowledgments

Built with:

LangGraph - Agent framework
MCP - Model Context Protocol
OpenFGA - Authorization
LiteLLM - Multi-LLM support
OpenTelemetry - Observability

Special thanks to the open source community!

Contributing

We welcome contributions from the community! 🎉

Quick Start for Contributors

Read the guides:
- CONTRIBUTING.md - Contribution guidelines
- Development Guide - Developer setup
Find something to work on:
- Good First Issues
- Help Wanted
Get help:
- GitHub Discussions
- Support Guide

Contribution Areas

💻 Code: Features, bug fixes, performance improvements
📖 Documentation: Guides, tutorials, API docs
🧪 Testing: Unit tests, integration tests, test coverage
🔒 Security: Security improvements, audits
🌐 Translations: i18n support (future)
💡 Ideas: Feature requests, architecture discussions

All contributors will be recognized in our Contributors section!

Name		Name	Last commit message	Last commit date
Latest commit History 373 Commits
.ai		.ai
.claude		.claude
.cursor		.cursor
.github		.github
.mcp		.mcp
.openai		.openai
adr		adr
api		api
archive		archive
config		config
deployments		deployments
docker		docker
docs-internal		docs-internal
docs		docs
examples		examples
hooks		hooks
integrations		integrations
logo		logo
monitoring		monitoring
reference		reference
reports		reports
runbooks		runbooks
scripts		scripts
src/mcp_server_langgraph		src/mcp_server_langgraph
template		template
tests		tests
.all-contributorsrc		.all-contributorsrc
.cursorrules		.cursorrules
.editorconfig		.editorconfig
.env.example		.env.example
.env.production.template		.env.production.template
.flake8		.flake8
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.mcp.json		.mcp.json
.mcp.json.example		.mcp.json.example
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE_COMPLETION_v2.8.0.md		RELEASE_COMPLETION_v2.8.0.md
RELEASE_NOTES_v2.8.0.md		RELEASE_NOTES_v2.8.0.md
REPOSITORY_STRUCTURE.md		REPOSITORY_STRUCTURE.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
langgraph.json		langgraph.json
openapi.json		openapi.json
package.json		package.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Uh oh!