MCP Code Analysis Server

An intelligent MCP (Model Context Protocol) server that provides advanced code analysis and search capabilities for large codebases. Built with pure FastMCP implementation, it uses TreeSitter for parsing, PostgreSQL with pgvector for vector storage, and OpenAI embeddings for semantic search.

Features

🔍 Semantic Code Search: Natural language queries to find relevant code
🏛️ Domain-Driven Analysis: Extract business entities and bounded contexts using LLM
📊 Code Structure Analysis: Hierarchical understanding of modules, classes, and functions
🔄 Incremental Updates: Git-based change tracking for efficient re-indexing
🎯 Smart Code Explanations: AI-powered explanations with context aggregation
🔗 Dependency Analysis: Understand code relationships and dependencies
🌐 Knowledge Graph: Build semantic graphs with community detection (Leiden algorithm)
💡 DDD Refactoring: Domain-Driven Design suggestions and improvements
🚀 High Performance: Handles codebases with millions of lines of code
🐍 Python Support: Full support for Python with more languages coming

MCP Tools Available

Core Search Tools

search_code - Search for code using natural language queries with semantic understanding
find_definition - Find where symbols (functions, classes, modules) are defined
find_similar_code - Find code patterns similar to a given snippet using vector similarity
get_code_structure - Get the hierarchical structure of a code file

Code Analysis Tools

explain_code - Get hierarchical explanations of code elements (modules, classes, functions)
suggest_refactoring - Get AI-powered refactoring suggestions for code improvements
analyze_dependencies - Analyze dependencies and relationships between code entities

Repository Management Tools

sync_repository - Manually trigger synchronization for a specific repository

Domain-Driven Design Analysis Tools

extract_domain_model - Extract domain entities and relationships using LLM analysis
find_aggregate_roots - Find aggregate roots in the codebase using domain analysis
analyze_bounded_context - Analyze a bounded context and its relationships
suggest_ddd_refactoring - Suggest Domain-Driven Design refactoring improvements
find_bounded_contexts - Find all bounded contexts in the codebase
generate_context_map - Generate context maps (JSON, Mermaid, PlantUML)

Advanced Analysis Tools

analyze_coupling - Analyze coupling between bounded contexts with metrics
suggest_context_splits - Suggest how to split large bounded contexts
detect_anti_patterns - Detect DDD anti-patterns (anemic models, god objects, etc.)
analyze_domain_evolution - Track domain model changes over time
get_domain_metrics - Get comprehensive domain health metrics and insights

Quick Start

Prerequisites

Docker and Docker Compose
OpenAI API key (for semantic search capabilities)
Nix with flakes (recommended for development)

Docker Deployment (Recommended)

The easiest way to get started is using Docker Compose, which provides a complete isolated environment with PostgreSQL and pgvector.

Clone the repository:

git clone https://github.com/johannhartmann/mcp-code-analysis-server.git
cd mcp-code-analysis-server

Set up environment variables:

export OPENAI_API_KEY="your-api-key-here"
# Or add to .env file

Configure repositories: Create a config.yaml file to specify which repositories to track:

repositories:
  - url: https://github.com/owner/repo1
    branch: main
  - url: https://github.com/owner/repo2
    branch: develop
  - url: https://github.com/owner/private-repo
    access_token: "github_pat_..."  # For private repos

# Scanner configuration
scanner:
  storage_path: ./repositories
  exclude_patterns:
    - "__pycache__"
    - "*.pyc"
    - ".git"
    - "venv"
    - "node_modules"

Start the services with Docker Compose:

docker-compose up -d

This will:

Start PostgreSQL with pgvector extension
Build and start the MCP Code Analysis Server
Initialize the database with required schemas
Begin scanning configured repositories automatically

The server runs as a pure MCP implementation and can be accessed via any MCP-compatible client.

Development Environment (Local)

For development work, use the Nix development environment which provides all necessary tools and dependencies:

# Enter the Nix development environment
nix develop

# Install Python dependencies
uv sync

# Start PostgreSQL (if not using Docker Compose)
docker-compose up -d postgres

# Run the scanner to populate the database
python -m src.scanner

# Start the MCP server
python -m src.mcp_server

# Or run tests
pytest

# Check code quality
ruff check .
black --check .
mypy .
vulture src vulture_whitelist.py

The Nix environment includes:

Python 3.11 with all dependencies
Code formatting tools (black, isort)
Linters (ruff, pylint, bandit)
Type checker (mypy)
Dead code detection (vulture)
Test runner (pytest)
Pre-commit hooks

Configuration

Edit config.yaml to customize:

# OpenAI API key (can also use OPENAI_API_KEY env var)
openai_api_key: "sk-..."

# Repositories to track
repositories:
  - url: https://github.com/owner/repo
    branch: main  # Optional, uses default branch if not specified
  - url: https://github.com/owner/private-repo
    access_token: "github_pat_..."  # For private repos

# Scanner configuration
scanner:
  storage_path: ./repositories
  exclude_patterns:
    - "__pycache__"
    - "*.pyc"
    - ".git"
    - "venv"
    - "node_modules"

# Embeddings configuration
embeddings:
  model: "text-embedding-ada-002"
  batch_size: 100
  max_tokens: 8000

# MCP server configuration
mcp:
  host: "0.0.0.0"
  port: 8080

# Database configuration
database:
  host: localhost
  port: 5432
  database: code_analysis
  user: codeanalyzer
  password: your-secure-password

Usage Examples

Using the MCP Tools

Once the server is running, you can use the tools via any MCP client:

# Search for code using natural language
await mcp.call_tool("search_code", {
    "query": "functions that handle user authentication",
    "limit": 10
})

# Find where a symbol is defined
await mcp.call_tool("find_definition", {
    "name": "UserService",
    "entity_type": "class"
})

# Get hierarchical code explanation
await mcp.call_tool("explain_code", {
    "path": "src.auth.user_service.UserService"
})

# Find similar code patterns
await mcp.call_tool("find_similar_code", {
    "code_snippet": "def authenticate_user(username, password):",
    "limit": 5,
    "threshold": 0.7
})

# Get code structure
await mcp.call_tool("get_code_structure", {
    "file_path": "src/auth/user_service.py"
})

# Get refactoring suggestions
await mcp.call_tool("suggest_refactoring", {
    "file_path": "src/auth/user_service.py",
    "focus_area": "performance"
})

# Extract domain model from code
await mcp.call_tool("extract_domain_model", {
    "code_path": "src/domain/user.py",
    "include_relationships": True
})

# Find aggregate roots
await mcp.call_tool("find_aggregate_roots", {
    "context_name": "user_management"  # optional
})

# Analyze bounded context
await mcp.call_tool("analyze_bounded_context", {
    "context_name": "authentication"
})

# Generate context map
await mcp.call_tool("generate_context_map", {
    "output_format": "mermaid"  # json, mermaid, or plantuml
})

With Claude Desktop

Configure the MCP server in your Claude Desktop settings:

For stdio mode (when running locally):

{
  "mcpServers": {
    "code-analysis": {
      "command": "python",
      "args": ["-m", "src.mcp_server"],
      "cwd": "/path/to/mcp-code-analysis-server",
      "env": {
        "OPENAI_API_KEY": "your-api-key"
      }
    }
  }
}

For HTTP mode (when using Docker):

{
  "mcpServers": {
    "code-analysis": {
      "url": "http://localhost:8000"
    }
  }
}

Then in Claude Desktop:

"Search for functions that handle authentication"
"Show me the implementation of the UserService class"
"Find all usages of the database connection pool"
"What files import the utils module?"

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test types
pytest tests/unit/
pytest tests/integration/

Code Quality

The project uses comprehensive code quality tools integrated into the Nix development environment:

# Run all linters
ruff check .

# Format code
black .
isort .

# Type checking
mypy .

# Find dead code
vulture src vulture_whitelist.py

# Run pre-commit hooks
nix-pre-commit

Pre-commit Hooks

Install the pre-commit hooks for automatic code quality checks:

echo '#!/bin/sh' > .git/hooks/pre-commit
echo 'nix-pre-commit' >> .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

Architecture

The server consists of several key components:

Scanner Module: Monitors and synchronizes Git repositories with incremental updates
Parser Module: Extracts code structure using TreeSitter for accurate AST parsing
Embeddings Module: Generates semantic embeddings via OpenAI for vector search
Database Module: PostgreSQL with pgvector extension for efficient vector storage
Query Module: Processes natural language queries and symbol lookup
MCP Server: Pure FastMCP implementation exposing code analysis tools
Domain Module: Extracts domain entities and relationships for DDD analysis

Performance

Initial indexing: ~1000 files/minute with parallel processing
Incremental updates: <10 seconds for 100 changed files using Git tracking
Query response: <2 seconds for semantic search with pgvector
Scalability: Supports codebases up to 10M+ lines of code
Memory efficiency: Optimized database sessions and batch processing

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Johann-Peter Hartmann Email: johann-peter.hartmann@mayflower.de GitHub: @johannhartmann

Key Technologies

FastMCP: Pure MCP protocol implementation
TreeSitter: Robust code parsing and AST generation
pgvector: High-performance vector similarity search
OpenAI Embeddings: Semantic understanding of code
PostgreSQL: Reliable data persistence and complex queries
Nix: Reproducible development environment
Docker: Containerized deployment and isolation

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

An intelligent server that provides semantic code search, domain-driven analysis, and advanced code understanding for large codebases using LLMs and vector embeddings.

Related MCP Servers

MCP Codebase Insight
tosin2013
-
security
F
license
-
quality
A server component of the Model Context Protocol that provides intelligent analysis of codebases using vector search and machine learning to understand code patterns, architectural decisions, and documentation.
Last updated -
7
Python
AgenticRAG MCP Server
aibozo
-
security
A
license
-
quality
An intelligent codebase processing server that provides agentic RAG capabilities for code repositories, enabling semantic search and contextual understanding through self-evaluating retrieval loops.
Last updated -
Python
MIT License
RAPID MCP Server
otdavies
-
security
F
license
-
quality
A local server that provides powerful code analysis and search capabilities for software projects, helping AI assistants and development tools understand codebases for tasks like code generation and refactoring.
Last updated -
2
Python
AutoDev Codebase MCP Server
anrgct
-
security
F
license
-
quality
HTTP-based server that provides semantic code search capabilities to IDEs through the Model Context Protocol, allowing efficient codebase exploration without repeated indexing.
Last updated -
8
49
TypeScript

View all related MCP servers

MCP Code Analysis Server