Web Research Assistant

Overview Inspect Schema Related Servers Score Discussions

MIT License

web-research-assistant

README.md•17.4 kB

# Web Research Assistant MCP Server [![PyPI](https://img.shields.io/pypi/v/web-research-assistant?color=blue)](https://pypi.org/project/web-research-assistant/) [![Python Version](https://img.shields.io/pypi/pyversions/web-research-assistant)](https://pypi.org/project/web-research-assistant/) [![License](https://img.shields.io/github/license/elad12390/web-research-assistant)](https://github.com/elad12390/web-research-assistant/blob/main/LICENSE) [![CI](https://github.com/elad12390/web-research-assistant/workflows/CI/badge.svg)](https://github.com/elad12390/web-research-assistant/actions) Comprehensive Model Context Protocol (MCP) server that provides web research and discovery capabilities. Includes **13 tools**, **4 resources**, and **5 prompts** for searching, crawling, and analyzing web content, powered by your local Docker SearXNG instance, the [`crawl4ai`](https://github.com/unclecode/crawl4ai) project, and Pixabay API: 1. `web_search` — federated search across multiple engines via SearXNG 2. `search_examples` — find code examples, tutorials, and articles (defaults to recent content) 3. `search_images` — find high-quality stock photos, illustrations, and vectors via Pixabay 4. `crawl_url` — full page content extraction with advanced crawling 5. `package_info` — detailed package metadata from npm, PyPI, crates.io, Go 6. `package_search` — discover packages by keywords and functionality 7. `github_repo` — repository health metrics and development activity 8. `translate_error` — find solutions for error messages and stack traces from Stack Overflow (auto-detects CORS, fetch, and web errors) 9. `api_docs` — auto-discover and crawl official API documentation with examples (works for any API - no hardcoded URLs) 10. `extract_data` — extract structured data (tables, lists, fields, JSON-LD) from web pages with automatic detection 11. `compare_tech` — compare technologies side-by-side with NPM downloads, GitHub stars, and aspect analysis (React vs Vue, PostgreSQL vs MongoDB, etc.) 12. `get_changelog` — **NEW!** Get release notes and changelogs with breaking change detection (upgrade safely from version X to Y) 13. `check_service_status` — **NEW!** Instant health checks for 25+ services (Stripe, AWS, GitHub, OpenAI, etc.) - "Is it down or just me?" All tools feature comprehensive error handling, response size limits, usage tracking, and clear documentation for optimal AI agent integration. ### MCP Resources (Direct Data Lookups) - `package://{registry}/{name}` - Package info from npm, PyPI, crates.io, or Go modules - `github://{owner}/{repo}` - Repository information and health metrics - `status://{service}` - Service health status for 120+ services - `changelog://{registry}/{package}` - Package release notes and changelogs ### MCP Prompts (Reusable Workflows) - `research_package` - Comprehensive package evaluation - `debug_error` - Structured error debugging with solutions - `compare_technologies` - Side-by-side technology comparison - `evaluate_repository` - GitHub repository health assessment - `check_service_health` - Multi-service status monitoring ## Quick Start 1. **Set up SearXNG** (5 minutes): ```bash # Using Docker (recommended) docker run -d -p 2288:8080 searxng/searxng:latest ``` Then configure search engines - see [SEARXNG_SETUP.md](SEARXNG_SETUP.md) for optimized settings. 2. **Install the MCP server**: ```bash uvx web-research-assistant # or: pip install web-research-assistant ``` 3. **Configure Claude Desktop** - add to `claude_desktop_config.json`: ```json { "mcpServers": { "web-research-assistant": { "command": "uvx", "args": ["web-research-assistant"] } } } ``` 4. **Restart Claude Desktop** and start researching! > ⚠️ **For best results**: Configure SearXNG with GitHub, Stack Overflow, and other code-focused search engines. See [SEARXNG_SETUP.md](SEARXNG_SETUP.md) for the recommended configuration. ## Prerequisites ### Required - **Python 3.10+** - **A running SearXNG instance** on `http://localhost:2288` - **📖 See [SEARXNG_SETUP.md](SEARXNG_SETUP.md) for complete Docker setup guide** - ⚠️ **IMPORTANT**: For best results, enable these search engines in SearXNG: - **GitHub, Stack Overflow, GitLab** (for code search - critical!) - **DuckDuckGo, Brave** (for web search) - **MDN, Wikipedia** (for documentation) - **Reddit, HackerNews** (for tutorials and discussions) - See [SEARXNG_SETUP.md](SEARXNG_SETUP.md) for the full optimized configuration ### Optional - **Pixabay API key** for image search - [Get free key](https://pixabay.com/api/docs/) - **Playwright browsers** for advanced crawling (auto-installed with `crawl4ai-setup`) ### Developer Setup (if running from source) ```bash uv tool install uv # if you do not already have uv uv sync # creates the virtual environment uv run crawl4ai-setup # installs Chromium for crawl4ai ``` > You can also use `pip install -r requirements.txt` if you prefer pip over uv. ## Installation ### Option 1: Using uvx (Recommended - No installation needed!) ```bash uvx web-research-assistant ``` This runs the server directly from PyPI without installing it globally. ### Option 2: Install with pip ```bash pip install web-research-assistant web-research-assistant ``` ### Option 3: Install with uv ```bash uv tool install web-research-assistant web-research-assistant ``` By default the server communicates over stdio, which makes it easy to wire into Claude Desktop or any other MCP host. ### MCP Client Configuration #### Claude Desktop Add to `~/Library/Application Support/Claude/claude_desktop_config.json`: **Option 1: Using uvx (Recommended - No installation needed!)** ```json { "mcpServers": { "web-research-assistant": { "command": "uvx", "args": ["web-research-assistant"] } } } ``` **Option 2: Using installed package** ```json { "mcpServers": { "web-research-assistant": { "command": "web-research-assistant" } } } ``` #### OpenCode Add to `~/.config/opencode/opencode.json`: **Using uvx (Recommended)** ```json { "mcp": { "web-research-assistant": { "type": "local", "command": ["uvx", "web-research-assistant"], "enabled": true } } } ``` **Using installed package** ```json { "mcp": { "web-research-assistant": { "type": "local", "command": ["web-research-assistant"], "enabled": true } } } ``` #### Development (Running from source) For Claude Desktop: ```json { "mcpServers": { "web-research-assistant": { "command": "uv", "args": [ "--directory", "/ABSOLUTE/PATH/TO/web-research-assistant", "run", "web-research-assistant" ] } } } ``` For OpenCode: ```json { "mcp": { "web-research-assistant": { "type": "local", "command": [ "uv", "--directory", "/ABSOLUTE/PATH/TO/web-research-assistant", "run", "web-research-assistant" ], "enabled": true } } } ``` **Restart your MCP client** afterwards. The MCP tools will be available immediately. ## Tool behavior | Tool | When to use | Arguments | | ---- | ----------- | --------- | | `web_search` | Use first to gather recent information and URLs from SearXNG. Returns 1–10 ranked snippets with clickable URLs. | `query` (required), `reasoning` (required), optional `category` (defaults to `general`), and `max_results` (defaults to 5). | | `search_examples` | Find code examples, tutorials, and technical articles. Optimized for technical content with optional time filtering. Perfect for learning APIs or finding usage patterns. | `query` (required, e.g., "Python async examples"), `reasoning` (required), `content_type` (code/articles/both, defaults to both), `time_range` (day/week/month/year/all, defaults to all), optional `max_results` (defaults to 5). | | `search_images` | Find high-quality royalty-free stock images from Pixabay. Returns photos, illustrations, or vectors. Requires `PIXABAY_API_KEY` environment variable. | `query` (required, e.g., "mountain landscape"), `reasoning` (required), `image_type` (all/photo/illustration/vector, defaults to all), `orientation` (all/horizontal/vertical, defaults to all), optional `max_results` (defaults to 10). | | `crawl_url` | Call immediately after search when you need the actual article body for quoting, summarizing, or extracting data. | `url` (required), `reasoning` (required), optional `max_chars` (defaults to 8000 characters). | | `package_info` | Look up specific npm, PyPI, crates.io, or Go package metadata including version, downloads, license, and dependencies. Use when you know the package name. | `name` (required package name), `reasoning` (required), `registry` (npm/pypi/crates/go, defaults to npm). | | `package_search` | Search for packages by keywords or functionality (e.g., "web framework", "json parser"). Use when you need to find packages that solve a specific problem. | `query` (required search terms), `reasoning` (required), `registry` (npm/pypi/crates/go, defaults to npm), optional `max_results` (defaults to 5). | | `github_repo` | Get GitHub repository health metrics including stars, forks, issues, recent commits, and project details. Use when evaluating open source projects. | `repo` (required, owner/repo or full URL), `reasoning` (required), optional `include_commits` (defaults to true). | | `translate_error` | Find Stack Overflow solutions for error messages and stack traces. Auto-detects language/framework, extracts key terms (CORS, map, undefined, etc.), filters irrelevant results, and prioritizes Stack Overflow solutions. Handles web-specific errors (CORS, fetch). | `error_message` (required stack trace or error text), `reasoning` (required), optional `language` (auto-detected), optional `framework` (auto-detected), optional `max_results` (defaults to 5). | | `api_docs` | Auto-discover and crawl official API documentation. Dynamically finds docs URLs using patterns (docs.{api}.com, {api}.com/docs, etc.), searches for specific topics, crawls pages, and extracts overview, parameters, examples, and related links. Works for ANY API - no hardcoded URLs. Perfect for API integration and learning. | `api_name` (required, e.g., "stripe", "react"), `topic` (required, e.g., "create customer", "hooks"), `reasoning` (required), optional `max_results` (defaults to 2 pages). | | `extract_data` | Extract structured data from HTML pages. Supports tables, lists, fields (via CSS selectors), JSON-LD, and auto-detection. Returns clean JSON output. More efficient than parsing full page text. Perfect for scraping pricing tables, package specs, release notes, or any structured content. | `url` (required), `reasoning` (required), `extract_type` (table/list/fields/json-ld/auto, defaults to auto), optional `selectors` (CSS selectors for fields mode), optional `max_items` (defaults to 100). | | `compare_tech` | Compare 2-5 technologies side-by-side. Auto-detects category (framework/database/language) and gathers data from NPM, GitHub, and web search. Returns structured comparison with popularity metrics (downloads, stars), performance insights, and best-use summaries. Fast parallel processing (3-4s). | `technologies` (required list of 2-5 names), `reasoning` (required), optional `category` (auto-detects if not provided), optional `aspects` (auto-selected by category), optional `max_results_per_tech` (defaults to 3). | | `get_changelog` | **NEW!** Get release notes and changelogs for package upgrades. Fetches GitHub releases, highlights breaking changes, and provides upgrade recommendations. Answers "What changed in version X → Y?" and "Are there breaking changes?" Perfect for planning dependency updates. | `package` (required name), `reasoning` (required), optional `registry` (npm/pypi/auto, defaults to auto), optional `max_releases` (defaults to 5). | | `check_service_status` | **NEW!** Instantly check if external services are experiencing issues. Covers 25+ popular services (Stripe, AWS, GitHub, OpenAI, Vercel, etc.). Returns operational status, current incidents, and component health. Critical for production debugging - know immediately if the issue is external. Response time < 2s. | `service` (required name, e.g., "stripe", "aws"), `reasoning` (required). | Results are automatically trimmed (default 8 KB) so they stay well within MCP response expectations. If truncation happens, the text ends with a note reminding the model that more detail is available on request. ## Resources MCP Resources provide direct data access via URI templates - perfect for quick lookups without tool calls. | Resource URI | Description | Example | | ------------ | ----------- | ------- | | `package://{registry}/{name}` | Package metadata (version, downloads, license, dependencies) | `package://npm/express` | | `github://{owner}/{repo}` | Repository info (stars, forks, issues, activity) | `github://facebook/react` | | `status://{service}` | Service health status | `status://stripe` | | `changelog://{registry}/{package}` | Release notes and changelogs | `changelog://npm/typescript` | ## Prompts MCP Prompts are reusable message templates that guide AI assistants through common workflows. | Prompt | Arguments | Use Case | | ------ | --------- | -------- | | `research_package` | `package_name`, `registry` | Evaluate a package before adding it as a dependency | | `debug_error` | `error_message`, `language` (optional), `framework` (optional) | Debug an error with context and solutions | | `compare_technologies` | `tech1`, `tech2`, `tech3` (optional), `tech4` (optional), `tech5` (optional) | Compare frameworks, databases, or languages | | `evaluate_repository` | `owner`, `repo` | Assess a GitHub project's health and activity | | `check_service_health` | `services` (comma-separated) | Monitor multiple services at once | ## Configuration Environment variables let you adapt the server without touching code: | Variable | Default | Description | | -------- | ------- | ----------- | | `SEARXNG_BASE_URL` | `http://localhost:2288/search` | Endpoint queried by `web_search`. | | `SEARXNG_DEFAULT_CATEGORY` | `general` | Category used when none is provided. | | `SEARXNG_DEFAULT_RESULTS` | `5` | Default number of search hits. | | `SEARXNG_MAX_RESULTS` | `10` | Hard cap on hits per request. | | `SEARXNG_CRAWL_MAX_CHARS` | `8000` | Default character budget for `crawl_url`. | | `MCP_MAX_RESPONSE_CHARS` | `8000` | Overall response limit applied to every tool reply. | | `SEARXNG_MCP_USER_AGENT` | `web-research-assistant/0.1` | User-Agent header for outward HTTP calls. | | `PIXABAY_API_KEY` | _(empty)_ | API key for Pixabay image search. Get free key at [pixabay.com/api/docs](https://pixabay.com/api/docs/). | | `MCP_USAGE_LOG` | `~/.config/web-research-assistant/usage.json` | Location for usage analytics data. | ## Development The codebase is intentionally modular and organized: ``` web-research-assistant/ ├── src/searxng_mcp/ # Source code │ ├── config.py # Configuration and environment │ ├── search.py # SearXNG integration │ ├── crawler.py # Crawl4AI wrapper │ ├── images.py # Pixabay client │ ├── registry.py # Package registries (npm, PyPI, crates, Go) │ ├── github.py # GitHub API client │ ├── errors.py # Error parser (language/framework detection) │ ├── api_docs.py # API docs discovery (NO hardcoded URLs) │ ├── tracking.py # Usage analytics │ └── server.py # MCP server + 9 tools ├── docs/ # Documentation (27 files) └── [config files] ``` Each module is well under 400 lines, making the codebase easy to understand and extend. ### Usage Analytics All tools automatically track usage metrics including: - Tool invocation counts and success rates - Response times and performance trends - Common use case patterns (via the `reasoning` parameter) - Error frequencies and types Analytics data is stored in `~/.config/web-research-assistant/usage.json` and can be analyzed to optimize tool usage and identify patterns. Each tool requires a `reasoning` parameter that helps categorize why tools are being used, enabling better analytics and insights. **Note:** As of the latest update, the `reasoning` parameter is **required** for all tools (previously optional with defaults). This ensures meaningful analytics data collection. ## Documentation Comprehensive documentation is available in the [`docs/`](docs/) directory: - **[Project Status](docs/PROJECT_STATUS.md)** - Current status, metrics, roadmap - **[API Docs Implementation](docs/API_DOCS_IMPLEMENTATION.md)** - NEW tool documentation - **[Error Translator Design](docs/ERROR_TRANSLATOR_DESIGN.md)** - Error translator details - **[Tool Ideas Ranked](docs/tool-ideas-ranked.md)** - Prioritization and progress - **[SearXNG Configuration](docs/SEARXNG_OPTIMAL_CONFIG.md)** - Recommended setup - **[Quick Start Examples](docs/QUICK_START_EXAMPLES.md)** - Usage examples See the [docs README](docs/README.md) for a complete index.

Latest Blog Posts

The 50MB Markdown Files That Broke Our Server
By punkpeye on December 3, 2025.
react
react-router
node-js
OpenTelemetry for Model Context Protocol (MCP) Analytics and Agent Observability
By Om-Shree-0709 on November 29, 2025.
observability
mcp
opentelemetry
Securing Enterprise AI Agents with Unique Identities in the Model Context Protocol (MCP)
By Om-Shree-0709 on November 27, 2025.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/elad12390/web-research-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server