Provides a user-friendly web interface for the knowledge graph builder with dual JSON/SVG output visualization
Stores knowledge graph data as nodes with properties and relationships, enabling graph database persistence with enhanced metadata, lineage tracking, and visualization metadata
Uses local Ollama models for entity extraction and knowledge graph generation, supporting various models like llama3.2, mistral, codellama, and phi3
Knowledge Graph Builder MCP Server
A Knowledge Graph Builder that transforms text or web content into structured knowledge graphs using local AI models with MCP (Model Context Protocol) integration for persistent storage in Neo4j and Qdrant.
🚀 Features
- Local AI Processing: Uses local models via Ollama or LM Studio for entity extraction
- Large Content Support: Handles arbitrarily large content (300MB+) via intelligent chunking
- Web Content Extraction: Scrapes and analyzes full web pages without size limits
- Knowledge Graph Generation: Creates structured graphs with entities and relationships
- Smart Chunking: Automatically chunks large content with sentence boundary detection
- Entity Merging: Intelligently merges duplicate entities across chunks
- Real-Time Visualization: Live SVG graph updates as chunks are processed
- Interactive SVG Output: Color-coded entity types with progress tracking
- MCP Integration: Stores data in Neo4j (graph database) and Qdrant (vector database)
- UUID Tracking: Generates UUIDv8 for unified entity tracking across systems
- Gradio Interface: User-friendly web interface with dual JSON/SVG output
📊 Entity Types Extracted
- 👥 PERSON: Names, individuals, key figures
- 🏢 ORGANIZATION: Companies, institutions, groups
- 📍 LOCATION: Places, countries, regions, addresses
- 💡 CONCEPT: Ideas, technologies, abstract concepts
- 📅 EVENT: Specific events, occurrences, incidents
- 🔧 OTHER: Miscellaneous entities not fitting other categories
🔧 Setup
Requirements
Environment Variables
For detailed configuration instructions and complete environment variables reference, see the Configuration section below.
Quick Start Configuration:
Note: All environment variables are optional and have sensible defaults. The application will run without any configuration.
Local Model Setup
For Ollama:
For LM Studio:
- Download and install LM Studio
- Load a model in the local server
- Start the local server on port 1234
🏃 Running the Application
The application will launch a Gradio interface with MCP server capabilities enabled.
📝 Usage
Text Input
Paste any text content to analyze:
URL Input
Provide a web URL to extract and analyze:
Large Content Processing (300MB+ Files)
For very large content like LLM conversation extracts:
Output Format
The system returns a structured JSON knowledge graph:
🎨 Real-Time Graph Visualization
SVG Generation Features
- Color-Coded Entity Types: Each entity type has a distinct color (Person=Red, Organization=Teal, Location=Blue, Concept=Green, Event=Yellow, Other=Plum)
- Interactive Layout: Automatic graph layout using NetworkX spring layout algorithm
- Relationship Labels: Edge labels showing relationship types between entities
- Entity Information: Node labels with entity names and types
- Legend: Automatic legend generation based on entity types present
- Statistics: Real-time entity and relationship counts
Real-Time Processing for Large Content
- Progress Tracking: Visual progress bar showing chunk processing completion
- Incremental Updates: Graph updates after each chunk is processed
- Live Statistics: Running totals of entities and relationships discovered
- Incremental File Saves: Each chunk creates a timestamped SVG file
- Final Visualization: Complete graph saved as final SVG
File Output
- Single Content:
knowledge_graph_<uuid8>.svg
- Large Content (Chunked):
- Incremental:
knowledge_graph_<uuid8>_chunk_0001.svg
,chunk_0002.svg
, etc. - Final:
knowledge_graph_<uuid8>.svg
- Incremental:
Example Large Content Processing
🗄️ hKG (Hybrid Knowledge Graph) Storage with Visualization Integration
Neo4j Integration (Graph Database)
- Stores entities as nodes with properties and enhanced metadata
- Creates relationships between entities with lineage tracking
- Maintains UUIDv8 for entity tracking across all databases
- Tracks chunking metadata for large content processing
- Records processing method (single vs chunked)
- NEW: Visualization metadata in entity observations including:
- SVG file paths and availability status
- Entity color mappings for graph visualization
- Real-time update tracking for chunked processing
- Incremental file counts for large content processing
- Accessible via MCP server tools
Qdrant Integration (Vector Database)
- Stores knowledge graphs as vector embeddings with enhanced metadata
- Enables semantic search across graphs of any size
- Maintains metadata for each knowledge graph including chunk information
- Tracks content length, processing method, and chunk count
- Supports similarity search across large document collections
- NEW: Visualization lineage tracking including:
- Entity type and color mapping information
- SVG generation timestamps and file paths
- Real-time visualization update history
- Incremental SVG file tracking for large content
- Accessible via MCP server tools
hKG Unified Tracking with Visualization Lineage
- UUIDv8 Across All Systems: Common ancestry-encoded identifiers
- Content Lineage: Track how large content was processed and chunked
- Processing Metadata: Record chunk size, overlap, and processing method
- Entity Provenance: Track which chunks contributed to each entity
- Relationship Mapping: Maintain relationships across chunk boundaries
- Semantic Coherence: Ensure knowledge graph consistency across databases
- NEW - Visualization Lineage: Complete tracking of visual representation:
- SVG File Provenance: Track all generated visualization files
- Color Mapping Consistency: Maintain entity color assignments across chunks
- Real-Time Update History: Log all incremental visualization updates
- Cross-Database Visual Metadata: Synchronized visualization tracking in both Neo4j and Qdrant
- Incremental Visualization Tracking: Complete audit trail of real-time graph updates
🔧 Architecture
Core Components
app.py
: Main application file with Gradio interfaceextract_text_from_url()
: Web scraping functionality (app.py:41)chunk_text()
: Smart content chunking with sentence boundary detection (app.py:214)merge_extraction_results()
: Intelligent merging of chunk results (app.py:250)get_entity_color()
: Entity type color mapping (app.py:299)create_knowledge_graph_svg()
: SVG graph generation (app.py:311)RealTimeGraphVisualizer
: Real-time incremental visualization (app.py:453)extract_entities_and_relationships()
: AI-powered entity extraction with real-time updates (app.py:645)extract_entities_and_relationships_single()
: Single chunk processing (app.py:722)build_knowledge_graph()
: Main orchestration function with visualization (app.py:795)generate_uuidv8()
: UUID generation for entity tracking (app.py:68)
Data Flow with hKG Integration and Real-Time Visualization
- Input Processing: Text or URL input validation
- Content Extraction: Web scraping for URLs, direct text for text input
- Real-Time Visualizer Setup: Initialize incremental graph visualization system
- Content Chunking: Smart chunking for large content (>2000 chars) with sentence boundary detection
- AI Analysis with Live Updates: Local model processes each chunk for entities/relationships
- Incremental Visualization: Real-time SVG graph updates after each chunk completion
- Result Merging: Intelligent deduplication and merging of entities/relationships across chunks
- hKG Metadata Creation: Generate processing metadata for lineage tracking
- Graph Generation: Structured knowledge graph creation with enhanced metadata
- Final Visualization: Generate complete SVG graph with all entities and relationships
- hKG Storage: Persistence in Neo4j (graph) and Qdrant (vector) with unified UUIDv8 tracking
- Output: JSON response with complete knowledge graph, hKG metadata, and SVG visualization
🎛️ Configuration
Environment Variables Reference
All configuration is handled through environment variables. The application provides sensible defaults for all settings, allowing it to run without any configuration while still offering full customization.
Complete Environment Variables Table
Variable | Type | Default | Required | Description | Example Values |
---|---|---|---|---|---|
MODEL_PROVIDER | string | "ollama" | No | AI model provider to use | "ollama" , "lmstudio" |
LOCAL_MODEL | string | "llama3.2:latest" | No | Local model identifier | "llama3.2:latest" , "mistral:7b" , "codellama:13b" |
OLLAMA_BASE_URL | string | "http://localhost:11434" | No | Ollama API endpoint | "http://localhost:11434" , "http://192.168.1.100:11434" |
LMSTUDIO_BASE_URL | string | "http://localhost:1234" | No | LM Studio API endpoint | "http://localhost:1234" , "http://127.0.0.1:1234" |
CHUNK_SIZE | integer | 2000 | No | Characters per chunk for AI processing | 1000 , 2000 , 4000 , 8000 |
CHUNK_OVERLAP | integer | 200 | No | Overlap between chunks for context | 100 , 200 , 400 , 500 |
MAX_CHUNKS | integer | 0 | No | Maximum chunks to process (0=unlimited) | 0 , 100 , 1000 , 5000 |
HF_TOKEN | string | None | No | HuggingFace API token (legacy, unused) | "hf_xxxxxxxxxxxx" |
Configuration Methods
1. Environment Variables (Recommended)
2. Shell Configuration (.bashrc/.zshrc)
3. Python Environment File (.env)
Model Provider Configuration
Ollama Configuration (Default)
LM Studio Configuration
Large Content Processing Configuration
Chunk Size Optimization
Processing Limits
Performance Tuning Guidelines
For Speed Optimization
For Quality Optimization
For Memory-Constrained Systems
Configuration Validation
The application performs automatic validation of configuration settings:
- Model Provider: Validates
MODEL_PROVIDER
is either"ollama"
or"lmstudio"
- URLs: Validates that provider URLs are accessible
- Numeric Values: Ensures
CHUNK_SIZE
,CHUNK_OVERLAP
, andMAX_CHUNKS
are valid integers - Model Availability: Checks if the specified model is available on the provider
Configuration Troubleshooting
Common Issues and Solutions
1. Model Provider Not Responding
2. Model Not Found
3. Memory Issues with Large Content
4. Slow Processing
Example Configuration Scenarios
Scenario 1: Development Setup
Scenario 2: Production Setup
Scenario 3: Large Dataset Processing
Scenario 4: Resource-Constrained Environment
Advanced Configuration
Custom Model Endpoints
Dynamic Configuration
The application reads environment variables at startup. To change configuration:
- Set new environment variables
- Restart the application
- Configuration changes take effect immediately
Error Handling
Comprehensive error handling for:
- Invalid URLs or network failures
- Missing local models or API endpoints
- JSON parsing errors from LLM responses
- Malformed or empty inputs
- Database connection issues
- Invalid configuration values
- Model provider connectivity issues
- Memory constraints during large content processing
🔍 hKG MCP Integration with Visual Lineage
The application integrates with MCP servers for hybrid knowledge graph storage with complete visualization tracking:
- Neo4j: Graph database storage and querying with enhanced metadata + visualization lineage
- Qdrant: Vector database for semantic search with chunk tracking + visual metadata
- Unified Tracking: UUIDv8 across all storage systems for entity lineage + visualization provenance
- Metadata Persistence: Processing method, chunk count, content lineage + SVG generation tracking
- Large Content Support: Seamless handling of 300MB+ content via chunking + real-time visualization
- Visualization Integration: Complete visual representation tracking across all storage systems
Enhanced hKG Features via MCP
- Entity Provenance: Track which content chunks contributed to each entity + their visual representation
- Relationship Lineage: Maintain relationships across chunk boundaries + visual edge tracking
- Content Ancestry: UUIDv8 encoding for hierarchical content tracking + visualization file lineage
- Processing Audit: Complete record of how large content was processed + visualization generation
- Semantic Search: Vector similarity across knowledge graphs of any size + visual metadata search
- NEW - Visual Lineage: Complete visualization tracking including:
- SVG File Provenance: Track all generated visualization files with timestamps
- Entity Color Consistency: Maintain color mappings across all chunks and storage systems
- Real-Time Visualization History: Log every incremental graph update during processing
- Cross-Database Visual Sync: Synchronized visualization metadata in Neo4j and Qdrant
- Incremental Visualization Audit: Complete trail of real-time updates for large content
Visualization-Enhanced Storage
- Neo4j Entity Observations now include:
- SVG file paths and generation status
- Entity color assignments for visual consistency
- Real-time update counts for chunked processing
- Visualization availability and engine information
- Qdrant Vector Content now includes:
- Entity color mapping information for similarity search
- SVG generation timestamps and file paths
- Real-time visualization update metadata
- Incremental file tracking for large content visualization
MCP tools are automatically available when running in Claude Code environment with MCP servers configured.
🎯 hKG Visualization Architecture
Integrated Visualization Lineage System
The hKG system now maintains complete visualization lineage alongside traditional knowledge graph storage:
Visualization Metadata Flow
- Real-Time Updates: Each chunk generates incremental SVG with progress tracking
- Color Consistency: Entity colors maintained across all chunks and storage systems
- File Lineage: Complete audit trail of all generated SVG files
- Cross-Database Sync: Visualization metadata synchronized in both Neo4j and Qdrant
- Provenance Tracking: Link between source chunks, entities, and their visual representation
hKG Benefits for Large Content (300MB+)
- Visual Progress Monitoring: Real-time graph evolution during processing
- Chunk-Level Visualization: Individual SVG files for each processing stage
- Complete Audit Trail: Full lineage from source text to final visualization
- Cross-Reference Capability: Link entities back to their source chunks and visual appearance
- Scalable Visualization: Handles arbitrarily large graphs with consistent performance
📊 Development
Project Structure
Testing
Transform any content into structured knowledge graphs with the power of local AI and MCP integration!
This server cannot be installed
Transforms text or web content into structured knowledge graphs using local AI models with MCP integration for persistent storage in Neo4j and Qdrant.
Related MCP Servers
- -securityFlicense-qualityFacilitates knowledge graph representation with semantic search using Qdrant, supporting OpenAI embeddings for semantic similarity and robust HTTPS integration with file-based graph persistence.Last updated -1111TypeScript
- -securityFlicense-qualityAn MCP server that enables graph database interactions with Neo4j, allowing users to access and manipulate graph data through natural language commands.Last updated -Python
- -securityAlicense-qualityEnhanced knowledge graph memory server for AI assistants that uses Neo4j as the backend storage engine, enabling powerful graph queries and efficient storage of user interaction information with full MCP protocol compatibility.Last updated -14TypeScriptMIT License
- -securityFlicense-qualityEnables storage and retrieval of knowledge in a graph database format, allowing users to create, update, search, and delete entities and relationships in a Neo4j-powered knowledge graph through natural language.Last updated -3Python