README.md•13.8 kB
# AnyDB MCP Server
A Model Context Protocol (MCP) server that provides intelligent database operations through natural language processing. This server integrates SQLite databases with Ollama for AI-powered SQL generation and execution.
## Features
### Core Database Operations
- **Natural Language to SQL**: Convert plain English instructions into SQL queries using Ollama
- **Universal Database Operations**: Works with any SQLite table/entity without predefined schemas
- **Automatic Schema Evolution**: Dynamically adds columns to tables when new data requires them
- **MCP Integration**: Seamlessly integrates with Claude Desktop and other MCP-compatible clients
- **Async Operations**: Built on modern Python async/await for high performance
- **Safety First**: Separate tools for read and write operations
### Vector Database & RAG
- **File Embedding**: Automatically convert files into vector embeddings for semantic search
- **Semantic Search**: Find relevant content using natural language queries instead of exact keyword matching
- **RAG Support**: Enable Claude Desktop to answer questions about uploaded documents with context
- **Smart Chunking**: Intelligently splits large documents into overlapping chunks for better retrieval
- **Persistent Storage**: ChromaDB-powered vector database with automatic embedding generation
### Web Scraping & Knowledge Base (NEW!)
- **URL Scraping**: Extract and store content from web pages automatically
- **Smart Content Extraction**: Clean HTML and extract meaningful text with metadata
- **URL Security**: Built-in validation and security checks to prevent malicious URLs
- **Semantic Web Search**: Query scraped web content using natural language
- **Web Knowledge Management**: List, search, and manage scraped web pages
## Available Tools
### Database Tools
#### 1. `query_entity`
Query any table with natural language instructions.
**Parameters**:
- `entity_name` (required): Name of the table to query
- `instruction` (optional): Natural language query instruction (default: "SELECT all records")
**Example**: Query users table for active accounts
#### 2. `insert_entity`
Insert records into any table using natural language descriptions. Automatically adds new columns if the data contains fields not present in the current table schema.
**Parameters**:
- `entity_name` (required): Name of the table
- `data` (required): Data to insert (JSON or natural description)
**Example**: Insert a new user with email and name
**Auto-Schema**: If inserting "user with name John, email john@test.com, and premium status" into a table that only has name/email columns, the system will automatically add a "status" column
#### 3. `update_entity`
Update records in any table with conditions. Automatically adds new columns if the update introduces new fields.
**Parameters**:
- `entity_name` (required): Name of the table
- `instruction` (required): Update instruction
- `conditions` (optional): WHERE conditions
**Example**: Update user status to active where email matches
**Auto-Schema**: If updating with "set premium status to gold and loyalty points to 100" on a table without these columns, they will be added automatically
#### 4. `delete_entity`
Delete records from any table with optional conditions.
**Parameters**:
- `entity_name` (required): Name of the table
- `conditions` (optional): WHERE conditions for deletion
**Example**: Delete inactive users older than 30 days
#### 5. `create_table`
Create new tables with AI-generated schemas.
**Parameters**:
- `entity_name` (required): Name of the new table
- `schema_description` (required): Description of table schema
**Example**: Create a products table with name, price, and category
#### 6. `sql_query`
Execute raw SQL SELECT queries directly.
**Parameters**:
- `query` (required): SQL query to execute
**Example**: Direct SQL for complex joins and analytics
#### 7. `sql_execute`
Execute raw SQL modification queries (INSERT, UPDATE, DELETE, CREATE, etc.).
**Parameters**:
- `query` (required): SQL query to execute
**Example**: Direct SQL for complex data modifications
### Vector Database Tools (NEW!)
#### 8. `add_file_to_vector_db`
Add a file to the vector database for semantic search and RAG (Retrieval Augmented Generation).
**Parameters**:
- `filename` (required): Name of the file
- `content` (required): Content of the file (text)
- `metadata` (optional): Optional metadata for the file
**Example**: Add a document about machine learning for later semantic search
#### 9. `search_vector_db`
Search the vector database for relevant file content using semantic similarity.
**Parameters**:
- `query` (required): Search query for semantic similarity
- `max_results` (optional): Maximum number of results to return (default: 5)
**Example**: Find documents related to "neural networks and AI"
#### 10. `list_vector_files`
List all files stored in the vector database.
**Parameters**: None
**Example**: View all documents available for search
#### 11. `remove_file_from_vector_db`
Remove a file from the vector database.
**Parameters**:
- `filename` (required): Name of the file to remove
**Example**: Delete outdated documents from the knowledge base
### Web Scraping Tools (NEW!)
#### 12. `scrape_url`
Scrape content from a web page and store it in the vector database for semantic search and RAG.
**Parameters**:
- `url` (required): URL of the web page to scrape
- `custom_filename` (optional): Custom filename for the scraped content
**Example**: Scrape a Wikipedia article or documentation page for later querying
#### 13. `query_web_content`
Query scraped web page content using semantic search to find relevant information.
**Parameters**:
- `query` (required): Search query for finding relevant web content
- `max_results` (optional): Maximum number of results to return (default: 5)
**Example**: Search scraped web pages for information about specific topics
#### 14. `list_scraped_pages`
List all scraped web pages stored in the vector database with metadata.
**Parameters**: None
**Example**: View all websites you've scraped and stored
#### 15. `remove_scraped_page`
Remove a scraped web page from the vector database.
**Parameters**:
- `filename` (required): Filename of the scraped page to remove
**Example**: Clean up outdated or unwanted scraped web content
## Installation
### Prerequisites
- Python 3.8+
- [Ollama](https://ollama.ai/) running locally
- Claude Desktop (for MCP integration)
### Setup
1. **Clone the repository:**
```bash
git clone https://github.com/iamayuppie/AnyDbApp.git
cd AnyDbApp
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Start Ollama:**
```bash
ollama serve --port 1434
ollama pull llama3.1 # or your preferred model
```
4. **Run the server:**
```bash
python main.py
```
## Claude Desktop Integration
Add this server to Claude Desktop by editing your config file:
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"anydb": {
"command": "python",
"args": ["C:\\Path\\To\\AnyDbApp\\mcp_server_stdio.py"],
"env": {
"PYTHONPATH": "C:\\Path\\To\\AnyDbApp"
}
}
}
}
```
Restart Claude Desktop to connect the server.
## Configuration
### Ollama Settings
Default configuration in `mcp_server.py`:
- **Host**: localhost
- **Port**: 1434
- **Model**: llama3.1
### Database Settings
- **Default DB**: `anydb.sqlite` (created automatically)
- **Location**: Same directory as the server
- **Type**: SQLite with foreign key constraints enabled
## Usage Examples
Once integrated with Claude Desktop, you can use natural language:
### Database Operations
- *"Create a users table with id, name, email, and created_at fields"*
- *"Show me all active users from the last 30 days"*
- *"Insert a new product: iPhone 15, price $999, category Electronics"*
- *"Update all pending orders to processed where amount > 100"*
- *"Delete test users where email contains 'test'"*
### Smart Schema Evolution
- *"Insert a book: Title 'AI Handbook', Author 'Jane Doe', ISBN '123456', Format 'Hardcover'"* (automatically adds Format column if missing)
- *"Add employee with name John, salary 75000, department IT, and remote status"* (adds department and remote columns as needed)
- *"Update product with warranty period 2 years and eco-friendly rating A+"* (dynamically expands product schema)
### Vector Database & File Operations
- *"Add this document to the knowledge base"* (when attaching a file in Claude Desktop)
- *"Search for information about machine learning algorithms"*
- *"Find documents related to user authentication and security"*
- *"What does the uploaded contract say about payment terms?"*
- *"Show me all documents I've added to the database"*
- *"Remove the old privacy policy document"*
### Web Scraping & Knowledge Base
- *"Scrape this Wikipedia article about neural networks: https://en.wikipedia.org/wiki/Neural_network"*
- *"Save the content from this blog post for later reference"*
- *"What information did I scrape about machine learning from that research paper website?"*
- *"Search my scraped web pages for information about Python best practices"*
- *"Show me all the web pages I've scraped and stored"*
- *"Remove the outdated documentation I scraped last month"*
## Architecture
```
┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Claude │────│ MCP Server │────│ Ollama │
│ Desktop │ │ (stdio) │ │ (localhost) │
│ + File Upload │ │ │ │ │
└─────────────────┘ └──────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ Dual Storage │ │ Web Scraper │
│ │ │ Engine │
│ ┌──────────────┐ │ │ │
│ │ SQLite │ │◄───┤ • URL Validation│
│ │ Database │ │ │ • Content Extract│
│ └──────────────┘ │ │ • Security Checks│
│ │ └─────────────────┘
│ ┌──────────────┐ │ │
│ │ ChromaDB │ │◄────────────┘
│ │ Vector Store │ │ ← Web Content +
│ └──────────────┘ │ Document Embeddings
└──────────────────┘ & Semantic Search
```
## Development
### Project Structure
```
AnyDbApp/
├── main.py # Clean entry point with startup info
├── mcp_server.py # MCP server setup and tool routing
├── dbtool.py # Database operations and SQL tools
├── filetool.py # Vector database and file operations
├── webscrapertool.py # Web scraping and content extraction
├── requirements.txt # Python dependencies
├── pyproject.toml # Project metadata
└── README.md # This file
```
### Key Components
**Core Modules:**
- **main.py**: Entry point with dependency checking and startup information
- **mcp_server.py**: MCP protocol implementation, tool registration, and request routing
- **dbtool.py**: Database operations, SQL generation, and data management
- **filetool.py**: Vector database operations, file processing, and semantic search
- **webscrapertool.py**: Web scraping, content extraction, and URL processing
**Business Logic Classes:**
- **DatabaseManager**: Handles async SQLite operations and database connections
- **DatabaseTools**: High-level database operations with natural language support
- **OllamaClient**: Manages AI model communication for SQL generation
- **VectorDatabaseManager**: Manages ChromaDB operations and document embeddings
- **FileTools**: High-level file operations and semantic search functionality
- **WebScraperManager**: Handles web page fetching, content extraction, and URL validation
- **WebScraperTools**: High-level web scraping operations with vector database integration
## Troubleshooting
### Common Issues
1. **Server won't start**: Check if Ollama is running on port 1434
2. **No tools showing in Claude**: Verify MCP config path and restart Claude Desktop
3. **SQL errors**: Check table names and ensure proper natural language descriptions
4. **Ollama connection failed**: Confirm Ollama model is installed and accessible
### Debug Mode
Run with Python's verbose mode for detailed logs:
```bash
python -v main.py
```
## License
This project is open source. See LICENSE file for details.
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## Support
For issues and questions:
- Check the troubleshooting section
- Review Ollama and MCP documentation
- Open an issue on the repository