Skip to content

vishnu2kmohan/mcp-server-langgraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MCP Server with LangGraph + OpenFGA & Infisical

License: MIT Python 3.10+ Production Ready Use This Template Docker Kubernetes

CI/CD Pipeline PR Checks Quality Tests Security Scan

Security Audit Code Quality Code Coverage Property Tests Contract Tests Mutation Testing

A production-ready cookie-cutter template for building MCP servers with LangGraph's Functional API. Features comprehensive authentication (JWT), fine-grained authorization (OpenFGA), secrets management (Infisical), and OpenTelemetry-based observability.

🎯 Opinionated, production-grade foundation for your MCP server projects.

🚀 Use This Template

# Generate your own MCP server project
uvx cookiecutter gh:vishnu2kmohan/mcp_server_langgraph

# Answer a few questions and get a fully configured project!

See Cookiecutter Template Strategy for detailed information.


📖 Template vs Project Usage

Using This as a Template

For: Creating your own MCP server with custom tools and logic

How:

  1. Generate project: uvx cookiecutter gh:vishnu2kmohan/mcp_server_langgraph
  2. Customize tools in generated agent.py
  3. Update authorization model in scripts/setup/setup_openfga.py
  4. Deploy your custom server

What gets customized:

  • Project name, author, license
  • Which features to include (auth, observability, deployment configs)
  • LLM provider preferences
  • Tool implementations

See: Cookiecutter Template Strategy (ADR-0011)

Using This Project Directly

For: Learning, testing, or using the reference implementation

How:

  1. Clone: git clone https://github.com/vishnu2kmohan/mcp-server-langgraph.git
  2. Install: uv sync
  3. Configure: Copy .env.example to .env and add API keys
  4. Run: make run-streamable

What you get:

  • Fully working MCP server with example tools (agent_chat, conversation_search, conversation_get)
  • Complete observability stack
  • Production-ready deployment configs
  • Comprehensive test suite

See: Quick Start below


Features

⭐ Anthropic Best Practices (9.8/10 Adherence)

This project achieves reference-quality implementation of Anthropic's AI agent best practices:

  • 🎯 Just-in-Time Context Loading: Dynamic semantic search with Qdrant vector database
    • Load only relevant context when needed (60% token reduction)
    • Progressive discovery through iterative search
    • Token-aware batch loading with configurable budgets
  • ⚡ Parallel Tool Execution: Concurrent execution with automatic dependency resolution
    • 1.5-2.5x latency reduction for independent operations
    • Topological sorting for correct execution order
    • Graceful error handling and recovery
  • 📝 Enhanced Structured Note-Taking: LLM-based 6-category information extraction
    • Automatic categorization: decisions, requirements, facts, action_items, issues, preferences
    • Context preservation across multi-turn conversations
    • Fallback to rule-based extraction for reliability
  • ✅ Complete Agentic Loop: Full gather-action-verify-repeat cycle
    • Context compaction (40-60% token reduction)
    • LLM-as-judge verification (23% quality improvement)
    • Iterative refinement (up to 3 attempts)
    • Observable with full tracing

See: Anthropic Best Practices Assessment | ADR-0023 | ADR-0024 | ADR-0025 | Examples

🎯 Core Capabilities

  • Multi-LLM Support (LiteLLM): 100+ LLM providers - Anthropic, OpenAI, Google, Azure, AWS Bedrock, Ollama
  • Open-Source Models: Llama 3.1, Qwen 2.5, Mistral, DeepSeek, and more via Ollama
  • LangGraph Functional API: Stateful agent with conditional routing and checkpointing
  • MCP Server: Standard protocol for exposing AI agents as tools (stdio, StreamableHTTP)
  • Enterprise Authentication: Pluggable auth providers (InMemory, Keycloak SSO)
    • JWT Authentication: Token-based authentication with validation and expiration
    • Keycloak Integration: Production-ready SSO with OIDC/OAuth2 (integrations/keycloak.md)
    • Token Refresh: Automatic refresh token rotation
    • JWKS Verification: Public key verification without shared secrets
  • Session Management: Flexible session storage backends
    • InMemory: Fast in-memory sessions for development
    • Redis: Persistent sessions with TTL, sliding windows, concurrent limits
    • Advanced Features: Session lifecycle management, bulk revocation, user tracking
  • Fine-Grained Authorization: OpenFGA (Zanzibar-style) relationship-based access control
    • Role Mapping: Declarative role mappings with YAML configuration
    • Keycloak Sync: Automatic role/group synchronization to OpenFGA
    • Hierarchies: Role inheritance and conditional mappings
  • Secrets Management: Infisical integration for secure secret storage and retrieval
  • Feature Flags: Gradual rollouts with environment-based configuration
  • Dual Observability: OpenTelemetry + LangSmith for comprehensive monitoring
    • OpenTelemetry: Distributed tracing with Jaeger, metrics with Prometheus (30+ auth metrics)
    • LangSmith: LLM-specific tracing, prompt engineering, evaluations
  • Structured Logging: JSON logging with trace context correlation
  • Full Observability Stack: Docker Compose setup with OpenFGA, Keycloak, Redis, Jaeger, Prometheus, Grafana, and Qdrant
  • LangGraph Platform: Deploy to managed LangGraph Cloud with one command
  • Automatic Fallback: Resilient multi-model fallback for high availability

📦 Optional Dependencies

The project supports optional feature sets that can be installed on demand:

  • Secrets Management ([secrets]): Infisical integration for centralized secrets

    • Install: pip install -e ".[secrets]" or uv sync --extra secrets
    • Fallback: Environment variables (.env file)
    • Production: Recommended for secure secret rotation
    • See: Infisical Installation Guide
  • Self-Hosted Embeddings ([embeddings]): sentence-transformers for local embedding generation

    • Install: pip install -e ".[embeddings]" or uv sync --extra embeddings
    • Fallback: Google Gemini API (langchain-google-genai, installed by default)
    • Production: Use API-based embeddings (lower latency, no GPU required)
    • Note: Self-hosted embeddings require significant resources
  • GDPR Storage Backend: PostgreSQL or Redis for compliance data persistence

    • CRITICAL: In-memory storage is NOT production-ready
    • Required for: GDPR compliance endpoints (/api/v1/users/me/*)
    • Config: Set GDPR_STORAGE_BACKEND=postgres or redis in production
    • See: GDPR Storage Configuration
  • All Features ([all]): Install all optional dependencies

    • Install: pip install -e ".[all]" or uv sync --all-extras
    • Use for: Development, testing, full feature evaluation

Development vs Production:

  • Development: All features work with fallbacks (in-memory, env vars, API-based)
  • Production: Use persistent backends (Redis, PostgreSQL) and proper secret management

🧪 Quality & Testing

  • Property-Based Testing: 27+ Hypothesis tests discovering edge cases automatically
  • Contract Testing: 20+ JSON Schema tests ensuring MCP protocol compliance
  • Performance Regression Testing: Automated latency tracking against baselines
  • Mutation Testing: Test effectiveness verification with mutmut (80%+ target)
  • Strict Typing: Gradual mypy strict mode rollout (3 modules complete)
  • OpenAPI Validation: Automated schema generation and breaking change detection
  • 80% Code Coverage: Comprehensive unit and integration tests

🚀 Production Deployment

  • Kubernetes Ready: Production manifests for GKE, EKS, AKS, Rancher, VMware Tanzu
  • Helm Charts: Flexible deployment with customizable values and dependencies
  • Kustomize: Environment-specific overlays (dev/staging/production)
  • Multi-Platform: Docker Compose, kubectl, Kustomize, Helm deployment options
  • CI/CD Pipeline: Automated testing, validation, build, and deployment with GitHub Actions
  • Deployment Validation: Comprehensive validation scripts for all deployment configurations
  • E2E Testing: Automated deployment tests with kind clusters
  • High Availability: Pod anti-affinity, HPA, PDB, rolling updates
  • Monitoring: 25+ Prometheus alerts, 4 Grafana dashboards, 9 operational runbooks
  • Observability: Full monitoring for Keycloak, Redis, sessions, and application
  • Secrets: External secrets operator support, sealed secrets compatible
  • Service Mesh: Compatible with Istio, Linkerd, and other service meshes

📚 Documentation & Architecture

  • Architecture Decision Records (ADRs): 25 documented design decisions (adr/)
  • Comprehensive Documentation: Complete documentation index with guides, tutorials, and references
  • API Documentation: Interactive OpenAPI/Swagger UI

📚 Documentation

📖 Quality & Testing Guides

🚀 Deployment & Operations

📝 Architecture Decision Records (ADRs)

💡 Examples & Tutorials

Requirements

System Requirements

  • Python: 3.10, 3.11, or 3.12
  • Memory: 2GB RAM minimum (4GB recommended for production)
  • Disk: 500MB for dependencies + 1GB for optional vector databases
  • OS: Linux, macOS, or Windows with WSL2

Required Services (for full features)

  • Redis: Session storage (or use in-memory mode)
  • PostgreSQL: Compliance data storage (optional)
  • OpenFGA: Fine-grained authorization (optional)

Optional Components

  • Qdrant/Weaviate: Vector database for semantic search
  • Jaeger: Distributed tracing visualization
  • Prometheus + Grafana: Metrics and monitoring

See Production Checklist for detailed requirements.

Installation

Quick Install

Using uv (recommended):

This project uses uv for fast, reliable dependency management:

# Install from PyPI
uv pip install mcp-server-langgraph

# Or clone and develop locally (creates virtual environment automatically)
git clone https://github.com/vishnu2kmohan/mcp-server-langgraph.git
cd mcp-server-langgraph
uv sync  # Installs all dependencies from pyproject.toml and uv.lock

Why uv?

  • 10-100x faster than pip
  • 🔒 Reproducible builds via uv.lock lockfile
  • 📦 Single source of truth in pyproject.toml
  • 🛡️ Better dependency resolution

Alternative: Using pip:

# Install from PyPI
pip install mcp-server-langgraph

# Or install from source
git clone https://github.com/vishnu2kmohan/mcp-server-langgraph.git
cd mcp-server-langgraph
pip install -e .

Note: requirements*.txt files are deprecated. Use uv sync instead.

Verify Installation

python -c "import mcp_server_langgraph; print(mcp_server_langgraph.__version__)"

See Installation Guide for complete instructions, including:

  • Docker installation
  • Virtual environment setup
  • Dependency management
  • Configuration options

Architecture

System Architecture

┌──────────────────────┐
│    MCP Client        │
│  (Claude Desktop     │
│   or other)          │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────────────────────┐
│         MCP Server                   │
│  (server_stdio.py/streamable.py)    │
│  ┌────────────────────────────┐     │
│  │   Auth Middleware          │     │
│  │   - JWT Verification       │     │
│  │   - OpenFGA Authorization  │     │
│  └────────────────────────────┘     │
│  ┌────────────────────────────┐     │
│  │   LangGraph Agent          │     │
│  │   - Context Compaction     │     │
│  │   - Pydantic AI Routing    │     │
│  │   - Tool Execution         │     │
│  │   - Response Generation    │     │
│  │   - Output Verification    │     │
│  │   - Iterative Refinement   │     │
│  └────────────────────────────┘     │
└──────────┬───────────────────────────┘
           │
           ▼
┌──────────────────────────────────────┐
│    Observability Stack               │
│  ┌──────────┐    ┌──────────────┐   │
│  │ Traces   │    │   Metrics    │   │
│  │ (Jaeger) │    │ (Prometheus) │   │
│  └─────┬────┘    └──────┬───────┘   │
│        └────────────────┘            │
│                ▼                     │
│        ┌──────────────┐              │
│        │   Grafana    │              │
│        └──────────────┘              │
└──────────────────────────────────────┘

Agentic Loop (ADR-0024, ADR-0025)

Our agent implements Anthropic's full gather-action-verify-repeat cycle with advanced enhancements:

┌─────────────────────────────────────────────────┐
│         LangGraph Agent Workflow                │
│                                                 │
│  START                                          │
│    │                                            │
│    ▼                                            │
│  ┌─────────────────────┐                       │
│  │ 0. Load Context     │ Just-in-Time          │
│  │    (Dynamic)        │ Semantic Search       │
│  └──────────┬──────────┘                       │
│             │                                    │
│             ▼                                    │
│  ┌─────────────────────┐                       │
│  │ 1. Gather Context   │ Compaction when       │
│  │    (Compact)        │ approaching limits    │
│  └──────────┬──────────┘                       │
│             │                                    │
│             ▼                                    │
│  ┌─────────────────────┐                       │
│  │ 2. Take Action      │ Route & Execute       │
│  │    (Route/Tools)    │ (Parallel if enabled) │
│  └──────────┬──────────┘                       │
│             │                                    │
│             ▼                                    │
│  ┌─────────────────────┐                       │
│  │    (Respond)        │ Generate Response     │
│  └──────────┬──────────┘                       │
│             │                                    │
│             ▼                                    │
│  ┌─────────────────────┐                       │
│  │ 3. Verify Work      │ LLM-as-Judge          │
│  │    (Verify)         │ Quality Check         │
│  └──────────┬──────────┘                       │
│             │                                    │
│        ┌────┴────┐                              │
│        │         │                              │
│     Passed    Failed                            │
│        │         │                              │
│        │         ▼                              │
│        │    ┌─────────────────────┐            │
│        │    │ 4. Repeat           │            │
│        │    │    (Refine)         │ Max 3×     │
│        │    └──────────┬──────────┘            │
│        │               │                        │
│        │               └─────►(Respond)        │
│        │                                        │
│        ▼                                        │
│      END                                        │
│                                                 │
└─────────────────────────────────────────────────┘

Key Features:

  • Just-in-Time Context Loading: Dynamic semantic search (60% token reduction)
  • Context Compaction: Prevents overflow on long conversations (40-60% token reduction)
  • Parallel Tool Execution: Concurrent execution with dependency resolution (1.5-2.5x speedup)
  • Enhanced Note-Taking: LLM-based 6-category extraction for long-term context
  • Output Verification: LLM-as-judge pattern catches errors before users see them (23% quality improvement)
  • Iterative Refinement: Up to 3 self-correction attempts for quality
  • Observable: Full tracing of each loop component

See ADR-0024: Agentic Loop Implementation and ADR-0025: Advanced Enhancements for details.

Quick Start

🐳 Docker Compose (Recommended)

Get the complete stack running in 2 minutes:

# Quick start script handles everything
./scripts/docker-compose-quickstart.sh

This starts:

Then setup OpenFGA:

python scripts/setup/setup_openfga.py
# Add OPENFGA_STORE_ID and OPENFGA_MODEL_ID to .env
docker-compose restart agent

Test the agent:

curl http://localhost:8000/health

See Docker Compose documentation for details.

🐍 Local Python Development

  1. Install dependencies:
uv sync  # Install all dependencies and create virtual environment
# Note: Creates .venv automatically with all dependencies from pyproject.toml
  1. Start infrastructure (without agent):
# Start only supporting services
docker-compose up -d openfga postgres otel-collector jaeger prometheus grafana
  1. Configure environment:
cp .env.example .env
# Edit .env with your API keys:
# - GOOGLE_API_KEY (get from https://aistudio.google.com/apikey)
# - ANTHROPIC_API_KEY or OPENAI_API_KEY (optional)
  1. Setup OpenFGA:
python scripts/setup/setup_openfga.py
# Save OPENFGA_STORE_ID and OPENFGA_MODEL_ID to .env
  1. Run the agent locally:
python -m mcp_server_langgraph.mcp.server_streamable
  1. Test:
# Test with example client
python examples/client_stdio.py

# Or curl
curl http://localhost:8000/health

Usage

Running the MCP Server

python -m mcp_server_langgraph.mcp.server_stdio

Testing with Example Client

python examples/client_stdio.py

MCP Client Configuration

Add to your MCP client config (e.g., Claude Desktop):

{
  "mcpServers": {
    "langgraph-agent": {
      "command": "python",
      "args": ["/path/to/mcp_server_langgraph/src/mcp_server_langgraph/mcp/server_stdio.py"]
    }
  }
}

Authentication & Authorization

Token-Based Authentication (v2.8.0)

All tool calls now require JWT token authentication for security:

import httpx

# 1. Login to get JWT token
async with httpx.AsyncClient() as client:
    response = await client.post(
        "http://localhost:8000/auth/login",
        json={"username": "alice", "password": "alice123"}
    )
    data = response.json()
    token = data["access_token"]
    print(f"Token expires in {data['expires_in']}s")

# 2. Use token in all tool calls
response = await client.post(
    "http://localhost:8000/message",
    json={
        "jsonrpc": "2.0",
        "method": "tools/call",
        "params": {
            "name": "agent_chat",
            "arguments": {
                "message": "Hello!",
                "token": token,  # ✅ Required
                "user_id": "user:alice"
            }
        }
    }
)

See: Authentication Migration Guide for complete details

Configurable Authentication Providers

The system supports multiple authentication backends via the auth factory:

# Development: In-memory user provider (with password validation)
# Set in .env:
AUTH_PROVIDER=inmemory

# Production: Keycloak SSO with OIDC/OAuth2
# Set in .env:
AUTH_PROVIDER=keycloak
KEYCLOAK_SERVER_URL=https://auth.example.com
KEYCLOAK_REALM=production
KEYCLOAK_CLIENT_ID=mcp-server
KEYCLOAK_CLIENT_SECRET=<secret>

Provider Features:

  • InMemoryUserProvider: Fast, password-protected, for development/testing
  • KeycloakUserProvider: Enterprise SSO, OIDC, automatic role sync to OpenFGA
  • Custom Providers: Extend UserProvider interface for custom auth systems

OpenFGA Fine-Grained Authorization

Uses relationship-based access control (Google Zanzibar model):

from mcp_server_langgraph.auth.openfga import OpenFGAClient

client = OpenFGAClient(
    api_url=settings.openfga_api_url,
    store_id=settings.openfga_store_id,
    model_id=settings.openfga_model_id
)

# Check permission
allowed = await client.check_permission(
    user="user:alice",
    relation="executor",
    object="tool:agent_chat"
)

# Grant permission
await client.write_tuples([
    {"user": "user:alice", "relation": "executor", "object": "tool:agent_chat"}
])

# List accessible resources
resources = await client.list_objects(
    user="user:alice",
    relation="executor",
    object_type="tool"
)

Default Users (Development Only)

Username Password Roles Description
alice alice123 user, premium Premium user, member and admin of organization:acme
bob bob123 user Standard user, member of organization:acme
admin admin123 admin Admin user with elevated privileges

⚠️ Security Warning: Default users use plaintext passwords for development only.

For Production:

  • Use AUTH_PROVIDER=keycloak with proper SSO
  • Or implement password hashing in InMemoryUserProvider
  • Never use default credentials in production

See:

Testing Strategy

This project uses a comprehensive, multi-layered testing approach to ensure production quality:

🧪 Test Types

Combined Coverage Testing (Recommended)

make test-coverage-combined
  • 60-65% combined coverage (unit + integration tests)
  • Most accurate coverage metric reflecting all test types
  • Includes MCP server entry points tested via integration tests
  • Generates combined HTML report: htmlcov-combined/index.html

Unit Tests (Fast, No External Dependencies)

make test-unit
# OR: pytest -m unit -v
  • ~400 tests with comprehensive assertions
  • Mock all external dependencies (LLM, OpenFGA, Infisical)
  • Test pure logic, validation, and error handling

Integration Tests (Require Infrastructure)

make test-integration
# OR: pytest -m integration -v
  • ~200 tests in isolated Docker environment
  • Real OpenFGA authorization checks
  • Real observability stack (Jaeger, Prometheus)
  • End-to-end workflows with actual dependencies
  • Coverage collection enabled (merged with unit tests in CI)

Property-Based Tests (Edge Case Discovery)

make test-property
# OR: pytest -m property -v
  • 27+ Hypothesis tests generating thousands of test cases
  • Automatic edge case discovery (empty strings, extreme values, malformed input)
  • Tests properties like "JWT encode/decode should be reversible"
  • See: tests/property/test_llm_properties.py, tests/property/test_auth_properties.py

Contract Tests (Protocol Compliance)

make test-contract
# OR: pytest -m contract -v
  • 20+ JSON Schema tests validating MCP protocol compliance
  • Ensures JSON-RPC 2.0 format correctness
  • Validates request/response schemas match specification
  • See: tests/contract/test_mcp_contract.py, tests/contract/mcp_schemas.json

Performance Regression Tests

make test-regression
# OR: pytest -m regression -v
  • Tracks latency metrics against baselines
  • Alerts on >20% performance regressions
  • Monitors: agent_response (p95 < 5s), llm_call (p95 < 10s), authorization (p95 < 50ms)
  • See: tests/regression/test_performance_regression.py, tests/regression/baseline_metrics.json

Mutation Testing (Test Effectiveness)

make test-mutation
# OR: mutmut run && mutmut results
  • Measures test quality by introducing code mutations
  • Target: 80%+ mutation score on critical modules
  • Identifies weak assertions and missing test cases
  • See: Mutation Testing Guide

OpenAPI Validation

make validate-openapi
# OR: python scripts/validate_openapi.py
  • Generates OpenAPI schema from code
  • Validates schema correctness
  • Detects breaking changes
  • Ensures all endpoints documented

🎯 Running Tests

# Quick: Run all unit tests (2-5 seconds)
make test-unit

# All automated tests (unit + integration)
make test

# All quality tests (property + contract + regression)
make test-all-quality

# Coverage report
make test-coverage
# Opens htmlcov/index.html with detailed coverage

# Full test suite (including mutation tests - SLOW!)
make test-unit && make test-all-quality && make test-mutation

📊 Quality Metrics

  • Code Coverage: 80% (target: 90%)
  • Property Tests: 27+ test classes with thousands of generated cases
  • Contract Tests: 20+ protocol compliance tests
  • Mutation Score: 80%+ target on critical modules (src/mcp_server_langgraph/core/agent.py, src/mcp_server_langgraph/auth/middleware.py, src/mcp_server_langgraph/core/config.py)
  • Type Coverage: Strict mypy on 3 modules (config, feature_flags, observability)
  • Performance: All p95 latencies within target thresholds

🔄 CI/CD Integration

GitHub Actions runs quality tests on every PR:

# .github/workflows/quality-tests.yaml
jobs:
  - property-tests     # 15min timeout
  - contract-tests     # MCP protocol validation
  - regression-tests   # Performance monitoring
  - openapi-validation # API schema validation
  - mutation-tests     # Weekly schedule (too slow for every PR)

See: .github/workflows/quality-tests.yaml

Feature Flags

Control features dynamically without code changes:

# Enable/disable features via environment variables
FF_ENABLE_PYDANTIC_AI_ROUTING=true      # Type-safe routing (default: true)
FF_ENABLE_LLM_FALLBACK=true             # Multi-model fallback (default: true)
FF_ENABLE_OPENFGA=true                  # Authorization (default: true)
FF_OPENFGA_STRICT_MODE=false            # Fail-closed vs fail-open (default: false)
FF_PYDANTIC_AI_CONFIDENCE_THRESHOLD=0.7 # Routing confidence (default: 0.7)

# All flags with FF_ prefix (20+ available)

Key Flags:

  • enable_pydantic_ai_routing: Type-safe routing with confidence scores
  • enable_llm_fallback: Automatic fallback to alternative models
  • enable_openfga: Fine-grained authorization (disable for development)
  • openfga_strict_mode: Fail-closed (deny on error) vs fail-open (allow on error)
  • enable_experimental_*: Master switches for experimental features

See: src/mcp_server_langgraph/core/feature_flags.py for all flags and validation

Observability

This project supports dual observability: OpenTelemetry for infrastructure metrics and LangSmith for LLM-specific tracing.

LangSmith Tracing (LLM Observability)

LangSmith provides comprehensive LLM and agent observability:

Setup:

# Add to .env
LANGSMITH_API_KEY=your-key-from-smith.langchain.com
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=mcp-server-langgraph

Features:

  • 🔍 Automatic Tracing: All LLM calls and agent steps traced
  • 🎯 Prompt Engineering: Iterate on prompts with production data
  • 📊 Evaluations: Compare model performance on datasets
  • 💬 User Feedback: Collect and analyze user ratings
  • 💰 Cost Tracking: Monitor LLM API costs per user/session
  • 🐛 Debugging: Root cause analysis with full context

View traces: https://smith.langchain.com/

See LangSmith Integration Guide for complete setup guide.

OpenTelemetry Tracing (Infrastructure)

Every request is traced end-to-end with OpenTelemetry:

from mcp_server_langgraph.observability.telemetry import tracer

with tracer.start_as_current_span("my_operation") as span:
    span.set_attribute("custom.attribute", "value")
    # Your code here

View traces in Jaeger: http://localhost:16686

Metrics

Standard metrics are automatically collected:

  • agent.tool.calls: Tool invocation counter
  • agent.calls.successful: Successful operation counter
  • agent.calls.failed: Failed operation counter
  • auth.failures: Authentication failure counter
  • authz.failures: Authorization failure counter
  • agent.response.duration: Response time histogram

View metrics in Prometheus: http://localhost:9090

Logging

Structured logging with trace context:

from mcp_server_langgraph.observability.telemetry import logger

logger.info("Event occurred", extra={
    "user_id": "user_123",
    "custom_field": "value"
})

Logs include trace_id and span_id for correlation with traces.

LangGraph Agent

The agent uses the functional API with:

  • State Management: TypedDict-based state with message history
  • Conditional Routing: Dynamic routing based on message content
  • Tool Integration: Extensible tool system (extend in src/mcp_server_langgraph/core/agent.py)
  • Checkpointing: Conversation persistence with MemorySaver

Extending the Agent

Add tools in src/mcp_server_langgraph/core/agent.py:

def custom_tool(state: AgentState) -> AgentState:
    # Your tool logic
    return state

workflow.add_node("custom_tool", custom_tool)
workflow.add_edge("router", "custom_tool")

Configuration

All settings via environment variables, Infisical, or .env file:

Core Configuration

Variable Description Default
SERVICE_NAME Service identifier mcp-server-langgraph
OTLP_ENDPOINT OpenTelemetry collector http://localhost:4317
JWT_SECRET_KEY Secret for JWT signing (loaded from Infisical)
ANTHROPIC_API_KEY Anthropic API key (loaded from Infisical)
MODEL_NAME Claude model to use claude-3-5-sonnet-20241022
LOG_LEVEL Logging level INFO
OPENFGA_API_URL OpenFGA server URL http://localhost:8080
OPENFGA_STORE_ID OpenFGA store ID (from setup)
OPENFGA_MODEL_ID OpenFGA model ID (from setup)
INFISICAL_CLIENT_ID Infisical auth client ID (optional)
INFISICAL_CLIENT_SECRET Infisical auth secret (optional)
INFISICAL_PROJECT_ID Infisical project ID (optional)

Anthropic Best Practices Configuration

Variable Description Default
Dynamic Context Loading
ENABLE_DYNAMIC_CONTEXT_LOADING Enable just-in-time context loading false
QDRANT_URL Qdrant server URL localhost
QDRANT_PORT Qdrant server port 6333
QDRANT_COLLECTION_NAME Collection name for contexts mcp_context
DYNAMIC_CONTEXT_MAX_TOKENS Max tokens per context load 2000
DYNAMIC_CONTEXT_TOP_K Number of contexts to retrieve 3
EMBEDDING_MODEL SentenceTransformer model all-MiniLM-L6-v2
CONTEXT_CACHE_SIZE LRU cache size 100
Parallel Execution
ENABLE_PARALLEL_EXECUTION Enable parallel tool execution false
MAX_PARALLEL_TOOLS Max concurrent tool executions 5
Enhanced Note-Taking
ENABLE_LLM_EXTRACTION Enable LLM-based extraction false
Context Management
ENABLE_CONTEXT_COMPACTION Enable context compaction true
COMPACTION_THRESHOLD Token count triggering compaction 8000
TARGET_AFTER_COMPACTION Target tokens after compaction 4000
RECENT_MESSAGE_COUNT Messages to keep uncompacted 5
Verification
ENABLE_VERIFICATION Enable response verification true
VERIFICATION_QUALITY_THRESHOLD Quality score threshold 0.7
MAX_REFINEMENT_ATTEMPTS Max refinement iterations 3

See src/mcp_server_langgraph/core/config.py for all options and .env.example for complete examples.

Secrets Loading Priority

  1. Infisical (if configured)
  2. Environment variables (fallback)
  3. Default values (last resort)

Monitoring Dashboard

Access Grafana at http://localhost:3000 (admin/admin) and create dashboards using:

  • Prometheus datasource: Metrics visualization
  • Jaeger datasource: Trace exploration

Example queries:

  • Request rate: rate(agent_tool_calls_total[5m])
  • Error rate: rate(agent_calls_failed_total[5m])
  • P95 latency: histogram_quantile(0.95, agent_response_duration_bucket)

Security Considerations

🔒 Production Checklist:

  • Store JWT secret in Infisical
  • Use production Infisical project with proper access controls
  • Configure OpenFGA with PostgreSQL backend (not in-memory)
  • Enable OpenFGA audit logging
  • Enable TLS for all services (OTLP, OpenFGA, PostgreSQL)
  • Implement rate limiting on MCP endpoints
  • Use production-grade user database
  • Review and minimize OpenFGA permissions
  • Set up secret rotation in Infisical
  • Enable monitoring alerts for auth failures
  • Implement token rotation and revocation
  • Use separate OpenFGA stores per environment
  • Enable MFA for Infisical access

Deployment Options

LangGraph Platform (Managed Cloud)

Deploy to LangGraph Platform for fully managed, serverless hosting:

# Login (uvx runs langgraph-cli without installing it)
uvx langgraph-cli login

# Deploy
uvx langgraph-cli deploy

Benefits:

  • ✅ Zero infrastructure management
  • ✅ Integrated LangSmith observability
  • ✅ Automatic versioning and rollbacks
  • ✅ Built-in scaling and load balancing
  • ✅ One-command deployment

See LangGraph Platform Guide for complete platform deployment guide.

Google Cloud Run (Serverless)

Deploy to Google Cloud Run for fully managed, serverless deployment:

# Quick deploy
cd cloudrun
./deploy.sh --setup

# Or use gcloud directly
gcloud run deploy mcp-server-langgraph \
  --source . \
  --region us-central1 \
  --allow-unauthenticated

Benefits:

  • ✅ Serverless autoscaling (0 to 100+ instances)
  • ✅ Pay only for actual usage
  • ✅ Automatic HTTPS and SSL certificates
  • ✅ Integrated with Google Secret Manager
  • ✅ Built-in monitoring and logging

See Cloud Run Deployment Guide for complete Cloud Run deployment guide.

Kubernetes Deployment

The agent is fully containerized and ready for Kubernetes deployment. Supported platforms:

  • Google Kubernetes Engine (GKE)
  • Amazon Elastic Kubernetes Service (EKS)
  • Azure Kubernetes Service (AKS)
  • Rancher
  • VMware Tanzu

Quick Deploy:

# Build and push image
docker build -t your-registry/langgraph-agent:v1.0.0 .
docker push your-registry/langgraph-agent:v1.0.0

# Deploy with Helm
helm install langgraph-agent ./deployments/helm/langgraph-agent \
  --namespace langgraph-agent \
  --create-namespace \
  --set image.repository=your-registry/langgraph-agent \
  --set image.tag=v1.0.0

# Or deploy with Kustomize
kubectl apply -k deployments/kustomize/overlays/production

See Kubernetes Deployment Guide for complete deployment guide.

API Gateway & Rate Limiting

Kong API Gateway integration provides:

  • Rate Limiting: Tiered limits (60-1000 req/min) per consumer/tier
  • Authentication: JWT, API Key, OAuth2
  • Traffic Control: Request transformation, routing, load balancing
  • Security: IP restriction, bot detection, CORS
  • Monitoring: Prometheus metrics, request logging
# Deploy with Kong rate limiting
helm install langgraph-agent ./deployments/helm/langgraph-agent \
  --set kong.enabled=true \
  --set kong.rateLimitTier=premium

# Or apply Kong manifests directly
kubectl apply -k deployments/kubernetes/kong/

See Kong Gateway Integration for complete Kong setup and rate limiting configuration.

MCP Transports & Registry

The agent supports multiple MCP transports:

  • StreamableHTTP (Recommended): Modern HTTP streaming for production
  • stdio: For Claude Desktop and local applications
# StreamableHTTP (recommended for web/production)
python -m mcp_server_langgraph.mcp.server_streamable

# stdio (local/desktop)
python -m mcp_server_langgraph.mcp.server_stdio

# Access StreamableHTTP endpoints
POST /message         # Main MCP endpoint (streaming or regular)
GET /tools            # List tools
GET /resources        # List resources

Why StreamableHTTP?

  • ✅ Modern HTTP/2+ streaming
  • ✅ Better load balancer/proxy compatibility
  • ✅ Proper request/response pairs
  • ✅ Full MCP spec compliance
  • ✅ Works with Kong rate limiting

Registry compliant - Includes manifest files for MCP Registry publication.

See MCP Registry Guide for registry deployment and transport configuration.

Quality Practices

This project maintains high code quality through:

📈 Current Quality Score: 9.6/10

Assessed across 7 dimensions:

  • Code Organization: 9/10 - Clear module structure, separation of concerns
  • Testing: 10/10 - Multi-layered testing (unit, integration, property, contract, regression, mutation)
  • Type Safety: 9/10 - Gradual strict mypy rollout (3/11 modules strict, 8 remaining)
  • Documentation: 10/10 - ADRs, guides, API docs, inline documentation
  • Error Handling: 9/10 - Comprehensive error handling, fallback modes
  • Observability: 10/10 - Dual observability (OpenTelemetry + LangSmith)
  • Security: 9/10 - JWT auth, fine-grained authz, secrets management, security scanning

🎯 Quality Gates

Pre-Commit:

  • Code formatting (black, isort)
  • Linting (flake8, mypy)
  • Security scan (bandit)

CI/CD (GitHub Actions):

  • Unit tests (Python 3.10, 3.11, 3.12)
  • Integration tests
  • Property-based tests
  • Contract tests
  • Performance regression tests
  • OpenAPI validation
  • Mutation tests (weekly)

Commands:

# Code quality checks
make format           # Format code (black + isort)
make lint             # Run linters (flake8 + mypy)
make security-check   # Security scan (bandit)

# Test suite
make test-unit        # Fast unit tests
make test-all-quality # Property + contract + regression
make test-coverage    # Coverage report

📝 Development Workflow

  1. Branch Protection: All changes via Pull Requests
  2. Conventional Commits: feat:, fix:, test:, docs:, refactor:
  3. Code Review: Required before merge
  4. Quality Gates: All tests must pass
  5. Documentation: ADRs for architectural decisions

See: .github/CLAUDE.md for complete development guide

🔄 Continuous Improvement

In Progress:

  • Expanding strict mypy to all modules (3/11 complete)
  • Increasing mutation score to 80%+ on all critical modules
  • Adding more property-based tests for edge case discovery

Recent Improvements (2025):

  • Implemented Anthropic's agentic loop (ADR-0024) with context compaction and verification
  • Adopted Anthropic's tool design best practices (ADR-0023)
  • Added 27+ property-based tests (Hypothesis)
  • Added 20+ contract tests (JSON Schema)
  • Implemented performance regression tracking
  • Set up mutation testing with mutmut
  • Created 25 Architecture Decision Records
  • Implemented feature flag system

Contributors

Thanks to all the amazing people who have contributed to this project! 🙌

This project follows the all-contributors specification.

Want to be listed here? See CONTRIBUTING.md!

Support

Need help? Check out our Support Guide for:

  • 📚 Documentation links
  • 💬 Where to ask questions
  • 🐛 How to report bugs
  • 🔒 Security reporting

License

MIT - see LICENSE file for details

Acknowledgments

Built with:

Special thanks to the open source community!

Contributing

We welcome contributions from the community! 🎉

Quick Start for Contributors

  1. Read the guides:

  2. Find something to work on:

  3. Get help:

Contribution Areas

  • 💻 Code: Features, bug fixes, performance improvements
  • 📖 Documentation: Guides, tutorials, API docs
  • 🧪 Testing: Unit tests, integration tests, test coverage
  • 🔒 Security: Security improvements, audits
  • 🌐 Translations: i18n support (future)
  • 💡 Ideas: Feature requests, architecture discussions

All contributors will be recognized in our Contributors section!