WebSearch MCP Server

High-performance Model Context Protocol (MCP) server for web search and content extraction with intelligent fallback system.

✨ Features

🚀 Fast: Async implementation with parallel execution
🔍 Multi-Engine: Google, Bing, DuckDuckGo, Startpage, Brave Search
🛡️ Intelligent Fallbacks: Google→Startpage, Bing→DuckDuckGo, Brave (standalone)
📄 Content Extraction: Clean text extraction from web pages
💾 Smart Caching: LRU cache with compression and deduplication
🔑 API Integration: Google Custom Search, Brave Search APIs with quota management
🔄 Auto-Rotation: Timestamped logs (weekly) and metrics (monthly) with auto-cleanup
⚡ Resilient: Automatic failover and comprehensive error handling

📦 Installation

Quick Start (Recommended)

# Install uv brew install uv # Run directly - no setup needed uvx --from git+https://github.com/vishalkg/web-search websearch-server

Development

git clone https://github.com/vishalkg/web-search.git cd web-search uv pip install -e .

⚙️ Configuration

API Keys (Optional but Recommended)

For best results, configure API keys for Google Custom Search and Brave Search. Without API keys, the server falls back to web scraping which is less reliable.

Get API Keys:

Google: Custom Search API
Brave: Brave Search API

Q CLI

# Add to Q CLI with API keys q mcp add --name websearch --command "uvx --from git+https://github.com/vishalkg/web-search websearch-server" # Then edit ~/.aws/amazonq/mcp.json to add API keys in the env section: { "websearch": { "command": "/opt/homebrew/bin/uvx", "args": ["--from", "git+https://github.com/vishalkg/web-search", "websearch-server"] "env": { "GOOGLE_CSE_API_KEY": "your-google-api-key", "GOOGLE_CSE_ID": "your-search-engine-id", "BRAVE_SEARCH_API_KEY": "your-brave-api-key" } } }

Test

q chat "search for python tutorials"

Claude Desktop

Add to your MCP settings file with API keys:

{ "mcpServers": { "websearch": { "command": "uvx", "args": ["--from", "git+https://github.com/vishalkg/web-search", "websearch-server"], "env": { "GOOGLE_CSE_API_KEY": "your-google-api-key", "GOOGLE_CSE_ID": "your-search-engine-id", "BRAVE_SEARCH_API_KEY": "your-brave-api-key" } } } }

🗂️ File Structure

The server automatically manages files in OS-appropriate locations:

macOS:

~/Library/Application Support/websearch/ # Data ~/Library/Logs/websearch/ # Logs ~/Library/Application Support/websearch/ # Config

Linux:

~/.local/share/websearch/ # Data ~/.local/state/websearch/ # Logs ~/.config/websearch/ # Config

Files:

data/ ├── search-metrics.jsonl # Search analytics (auto-rotated) └── quota/ └── quotas.json # API quota tracking logs/ └── web-search.log # Application logs (auto-rotated) config/ └── .env # Configuration file └── cache/ # Optional caching

Environment Variable Overrides

WEBSEARCH_HOME: Base directory (default: ~/.websearch)
WEBSEARCH_CONFIG_DIR: Config directory override
WEBSEARCH_LOG_DIR: Log directory override

🔧 Usage

The server provides two main tools with multiple search modes:

Search Web

# Standard 5-engine search (backward compatible) search_web("quantum computing applications", num_results=10) # New 3-engine fallback search (optimized) search_web_fallback("machine learning tutorials", num_results=5)

Search Engines:

Google Custom Search API (with Startpage fallback)
Bing (with DuckDuckGo fallback)
Brave Search API (standalone)
DuckDuckGo (scraping)
Startpage (scraping)

Fetch Page Content

# Extract clean text from URLs fetch_page_content("https://example.com") fetch_page_content(["https://site1.com", "https://site2.com"]) # Batch processing

🏗️ Architecture

websearch/ ├── core/ │ ├── search.py # Sync search orchestration │ ├── async_search.py # Async search orchestration │ ├── fallback_search.py # 3-engine fallback system │ ├── async_fallback_search.py # Async fallback system │ ├── ranking.py # Quality-first result ranking │ └── common.py # Shared utilities ├── engines/ │ ├── google_api.py # Google Custom Search API │ ├── brave_api.py # Brave Search API │ ├── bing.py # Bing scraping │ ├── duckduckgo.py # DuckDuckGo scraping │ └── startpage.py # Startpage scraping ├── utils/ │ ├── unified_quota.py # Unified API quota management │ ├── deduplication.py # Result deduplication │ ├── advanced_cache.py # Enhanced caching system │ └── http.py # HTTP utilities └── server.py # FastMCP server

🔧 Advanced Configuration

Environment Variables

# API Configuration export GOOGLE_CSE_API_KEY=your_google_api_key export GOOGLE_CSE_ID=your_google_cse_id export BRAVE_SEARCH_API_KEY=your_brave_api_key # Quota Management (Optional) export GOOGLE_DAILY_QUOTA=100 # Default: 100 requests/day export BRAVE_MONTHLY_QUOTA=2000 # Default: 2000 requests/month # Performance Tuning export WEBSEARCH_CACHE_SIZE=1000 export WEBSEARCH_TIMEOUT=10 export WEBSEARCH_LOG_LEVEL=INFO

How to Get API Keys

Google Custom Search API

API Key: Go to https://developers.google.com/custom-search/v1/introduction and click "Get a Key"
CSE ID: Go to https://cse.google.com/cse/ and follow prompts to create a search engine

Brave Search API

Go to Brave Search API
Sign up for a free account
Go to your dashboard
Copy the API key as BRAVE_API_KEY
Free tier: 2000 requests/month

Quota Management

Unified System: Single quota manager for all APIs
Google: Daily quota (default 100 requests/day)
Brave: Monthly quota (default 2000 requests/month)
Storage: Quota files stored in ~/.websearch/ directory
Auto-reset: Quotas automatically reset at period boundaries
Fallback: Automatic fallback to scraping when quotas exhausted

Search Modes

Standard Mode: Uses all 5 engines for maximum coverage
Fallback Mode: Uses 3 engines with intelligent fallbacks for efficiency
API-First Mode: Prioritizes API calls over scraping when keys available

🐛 Troubleshooting

Issue	Solution
No results	Check internet connection and logs
API quota exhausted	System automatically falls back to scraping
Google API errors	Verify `GOOGLE_CSE_API_KEY` and `GOOGLE_CSE_ID`
Brave API errors	Check `BRAVE_SEARCH_API_KEY` and quota status
Permission denied	`chmod +x start.sh`
Import errors	Ensure Python 3.12+ and dependencies installed
Circular import warnings	Fixed in v2.0+ (10.00/10 pylint score)

Debug Mode

# Enable detailed logging export WEBSEARCH_LOG_LEVEL=DEBUG python -m websearch.server

API Status Check

# Test API connectivity cd debug/ python test_brave_api.py # Test Brave API python test_fallback.py # Test fallback system

📈 Performance & Monitoring

Metrics

Pylint Score: 10.00/10 (perfect code quality)
Search Speed: ~2-3 seconds for 5-engine search
Fallback Speed: ~1-2 seconds for 3-engine search
Cache Hit Rate: ~85% for repeated queries
API Quota Efficiency: Automatic fallback prevents service interruption

Monitoring

Logs are written to web-search.log with structured format:

tail -f web-search.log | grep "search completed"

🔒 Security

No hardcoded secrets: All API keys via environment variables
Clean git history: Secrets scrubbed from all commits
Input validation: Comprehensive sanitization of search queries
Rate limiting: Built-in quota management for API calls
Secure defaults: HTTPS-only requests, timeout protection

🚀 Performance Tips

Use fallback mode for faster searches when you don't need maximum coverage
Set API keys to reduce reliance on scraping (faster + more reliable)
Enable caching for repeated queries (enabled by default)
Tune batch sizes for content extraction based on your needs

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Run tests (pytest)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

📄 License

MIT License - see LICENSE file for details.

Web Search MCP Server