Skip to main content
Glama

VLLM MCP Server

MIT License Python 3.11+ uv

A Model Context Protocol (MCP) server that enables text models to call multimodal models. This server supports both OpenAI and Dashscope (Alibaba Cloud) multimodal models, allowing text-only models to process images and other media formats through standardized MCP tools.

GitHub Repository: https://github.com/StanleyChanH/vllm-mcp

Features

  • Multi-Provider Support: OpenAI GPT-4 Vision and Dashscope Qwen-VL models

  • Multiple Transport Options: STDIO, HTTP, and Server-Sent Events (SSE)

  • Flexible Deployment: Docker, Docker Compose, and local development

  • Easy Configuration: JSON configuration files and environment variables

  • Comprehensive Tooling: MCP tools for model interaction, validation, and provider management

Quick Start

Prerequisites

  • Python 3.11+

  • uv package manager

  • API keys for OpenAI and/or Dashscope (阿里云)

Installation & Setup

  1. Clone the repository:

    git clone https://github.com/StanleyChanH/vllm-mcp.git cd vllm-mcp
  2. Set up environment:

    cp .env.example .env # Edit .env with your API keys nano .env # or use your preferred editor
  3. Configure API keys (in .env file):

    # Dashscope (阿里云) - Required for basic functionality DASHSCOPE_API_KEY=sk-your-dashscope-api-key # OpenAI - Optional OPENAI_API_KEY=sk-your-openai-api-key
  4. Install dependencies:

    uv sync
  5. Verify setup:

    uv run python test_simple.py

Running the Server

  1. Start the server (STDIO transport - default):

    ./scripts/start.sh
  2. Start with HTTP transport:

    ./scripts/start.sh --transport http --host 0.0.0.0 --port 8080
  3. Development mode with hot reload:

    ./scripts/start-dev.sh

Testing & Verification

  1. List available models:

    uv run python examples/list_models.py
  2. Run basic tests:

    uv run python test_simple.py
  3. Test MCP tools:

    uv run python examples/client_example.py

Docker Deployment

  1. Build and run with Docker Compose:

    # Create .env file with your API keys cp .env.example .env # Start the service docker-compose up -d
  2. Build manually:

    docker build -t vllm-mcp . docker run -p 8080:8080 --env-file .env vllm-mcp

Configuration

Environment Variables

# OpenAI Configuration OPENAI_API_KEY=your_openai_api_key OPENAI_BASE_URL=https://api.openai.com/v1 # Optional OPENAI_DEFAULT_MODEL=gpt-4o OPENAI_SUPPORTED_MODELS=gpt-4o,gpt-4o-mini,gpt-4-turbo,gpt-4-vision-preview # Dashscope Configuration DASHSCOPE_API_KEY=your_dashscope_api_key DASHSCOPE_DEFAULT_MODEL=qwen-vl-plus DASHSCOPE_SUPPORTED_MODELS=qwen-vl-plus,qwen-vl-max,qwen-vl-chat,qwen2-vl-7b-instruct,qwen2-vl-72b-instruct # Server Configuration (optional) VLLM_MCP_HOST=localhost VLLM_MCP_PORT=8080 VLLM_MCP_TRANSPORT=stdio VLLM_MCP_LOG_LEVEL=INFO

Configuration File

Create a config.json file:

{ "host": "localhost", "port": 8080, "transport": "stdio", "log_level": "INFO", "providers": [ { "provider_type": "openai", "api_key": "${OPENAI_API_KEY}", "base_url": "${OPENAI_BASE_URL}", "default_model": "gpt-4o", "max_tokens": 4000, "temperature": 0.7 }, { "provider_type": "dashscope", "api_key": "${DASHSCOPE_API_KEY}", "default_model": "qwen-vl-plus", "max_tokens": 4000, "temperature": 0.7 } ] }

MCP Tools

The server provides the following MCP tools:

generate_multimodal_response

Generate responses from multimodal models.

Parameters:

  • model (string): Model name to use

  • prompt (string): Text prompt

  • image_urls (array, optional): List of image URLs

  • file_paths (array, optional): List of file paths

  • system_prompt (string, optional): System prompt

  • max_tokens (integer, optional): Maximum tokens to generate

  • temperature (number, optional): Generation temperature

  • provider (string, optional): Provider name (auto-detected if not specified)

Example:

result = await session.call_tool("generate_multimodal_response", { "model": "gpt-4o", "prompt": "Describe this image", "image_urls": ["https://example.com/image.jpg"], "max_tokens": 500 })

list_available_providers

List available model providers and their supported models.

Example:

result = await session.call_tool("list_available_providers", {})

validate_multimodal_request

Validate if a multimodal request is supported by the specified provider.

Parameters:

  • model (string): Model name to validate

  • image_count (integer, optional): Number of images

  • file_count (integer, optional): Number of files

  • provider (string, optional): Provider name

Supported Models

OpenAI

  • gpt-4o

  • gpt-4o-mini

  • gpt-4-turbo

  • gpt-4-vision-preview

Dashscope

  • qwen-vl-plus

  • qwen-vl-max

  • qwen-vl-chat

  • qwen2-vl-7b-instruct

  • qwen2-vl-72b-instruct

Model Selection

Using Environment Variables

You can configure default models and supported models through environment variables:

# OpenAI OPENAI_DEFAULT_MODEL=gpt-4o OPENAI_SUPPORTED_MODELS=gpt-4o,gpt-4o-mini,gpt-4-turbo # Dashscope DASHSCOPE_DEFAULT_MODEL=qwen-vl-plus DASHSCOPE_SUPPORTED_MODELS=qwen-vl-plus,qwen-vl-max

Listing Available Models

Use the list_available_providers tool to see all available models:

result = await session.call_tool("list_available_providers", {}) print(result.content[0].text)

Model Selection Examples

# Use specific OpenAI model result = await session.call_tool("generate_multimodal_response", { "model": "gpt-4o-mini", # Specify exact model "prompt": "Analyze this image", "image_urls": ["https://example.com/image.jpg"] }) # Use specific Dashscope model result = await session.call_tool("generate_multimodal_response", { "model": "qwen-vl-max", # Specify exact model "prompt": "Describe what you see", "image_urls": ["https://example.com/image.jpg"] }) # Auto-detect provider based on model name # OpenAI models (gpt-*) will use OpenAI provider # Dashscope models (qwen-*) will use Dashscope provider

Model Configuration File

You can also configure models in config.json:

{ "providers": [ { "provider_type": "openai", "api_key": "${OPENAI_API_KEY}", "default_model": "gpt-4o-mini", "supported_models": ["gpt-4o-mini", "gpt-4-turbo"], "max_tokens": 4000, "temperature": 0.7 }, { "provider_type": "dashscope", "api_key": "${DASHSCOPE_API_KEY}", "default_model": "qwen-vl-max", "supported_models": ["qwen-vl-plus", "qwen-vl-max"], "max_tokens": 4000, "temperature": 0.7 } ] }

Client Integration

Python Client

import asyncio from mcp.client.session import ClientSession from mcp.client.stdio import StdioServerParameters, stdio_client async def main(): server_params = StdioServerParameters( command="uv", args=["run", "python", "-m", "vllm_mcp.server"], env={"PYTHONPATH": "src"} ) async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() # Generate multimodal response result = await session.call_tool("generate_multimodal_response", { "model": "gpt-4o", "prompt": "Analyze this image", "image_urls": ["https://example.com/image.jpg"] }) print(result.content[0].text) asyncio.run(main())

MCP Client Configuration

Add to your MCP client configuration:

{ "mcpServers": { "vllm-mcp": { "command": "uv", "args": ["run", "python", "-m", "vllm_mcp.server"], "env": { "PYTHONPATH": "src", "OPENAI_API_KEY": "${OPENAI_API_KEY}", "DASHSCOPE_API_KEY": "${DASHSCOPE_API_KEY}" } } } }

Development

Project Structure

vllm-mcp/ ├── src/vllm_mcp/ │ ├── __init__.py │ ├── server.py # Main MCP server │ ├── models.py # Data models │ └── providers/ │ ├── __init__.py │ ├── openai_provider.py │ └── dashscope_provider.py ├── scripts/ │ ├── start.sh # Production startup │ └── start-dev.sh # Development startup ├── examples/ │ ├── client_example.py # Example client │ └── mcp_client_config.json ├── docker-compose.yml ├── Dockerfile ├── config.json └── README.md

Adding New Providers

  1. Create a new provider class in src/vllm_mcp/providers/

  2. Implement the required methods:

    • generate_response()

    • is_model_supported()

    • validate_request()

  3. Register the provider in src/vllm_mcp/server.py

  4. Update configuration schema

Running Tests

# Install development dependencies uv add --dev pytest pytest-asyncio # Run tests uv run pytest

Deployment Options

STDIO Transport (Default)

Best for MCP client integrations and local development.

vllm-mcp --transport stdio

HTTP Transport

Suitable for web service deployments.

vllm-mcp --transport http --host 0.0.0.0 --port 8080

SSE Transport

For real-time streaming responses.

vllm-mcp --transport sse --host 0.0.0.0 --port 8080

Troubleshooting

Common Issues

  1. Import Error: No module named 'vllm_mcp'

    # Make sure you're in the project root and run: uv sync export PYTHONPATH="src:$PYTHONPATH"
  2. API Key Not Found

    # Ensure your .env file is properly configured: cp .env.example .env # Edit .env with your actual API keys
  3. Dashscope API Errors

    • Verify your API key is valid and active

    • Check if you have sufficient quota

    • Ensure network connectivity to Dashscope services

  4. Server Startup Issues

    # Check for port conflicts: lsof -i :8080 # Try a different port: ./scripts/start.sh --port 8081
  5. Docker Issues

    # Rebuild Docker image: docker-compose down docker-compose build --no-cache docker-compose up -d

Debug Mode

Enable debug logging for troubleshooting:

./scripts/start.sh --log-level DEBUG

Getting Help

  • Check SETUP_GUIDE.md for detailed setup instructions

  • Run uv run python test_simple.py to verify basic functionality

  • Review logs for error messages and warnings

License

MIT License

Contributing

  1. Fork the repository

  2. Create a feature branch

  3. Make your changes

  4. Add tests if applicable

  5. Submit a pull request

Support

Acknowledgments

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/StanleyChanH/vllm-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server