Skip to main content
Glama

DINO-X Image Detection MCP Server

DINO-X MCP

License npm version npm downloads PRs Welcome MCP Badge GitHub stars

English | 中文

DINO-X Official MCP Server — powered by the DINO-X and Grounding DINO models — brings fine-grained object detection and image understanding to your multimodal applications.

Why DINO-X MCP?

With DINO-X MCP, you can:

  • Fine-Grained Understanding: Full image detection, object detection, and region-level descriptions.

  • Structured Outputs: Get object categories, counts, locations, and attributes for VQA and multi-step reasoning tasks.

  • Composable: Works seamlessly with other MCP servers to build end-to-end visual agents or automation pipelines.

Transport Modes

DINO-X MCP supports two transport modes:

Feature

STDIO (default)

Streamable HTTP

Runtime

Local

Local or Cloud

Transport

Standard I/O

HTTP (streaming responses)

Input source

file://

and

https://

https://

only

Visualization

Supported (saves annotated images locally)

Not supported (for now)

Quick Start

1. Prepare an MCP client

Any MCP-compatible client works, e.g.:

2. Get your API key

Apply on the DINO-X platform: Request API Key (new users get free quota).

3. Configure MCP

Option A: Official Hosted Streamable HTTP (Recommended)

Add to your MCP client config and replace with your API key:

{ "mcpServers": { "dinox-mcp": { "url": "https://mcp.deepdataspace.com/mcp?key=your-api-key" } } }

Option B: Use the NPM package locally (STDIO)

Install Node.js first

  • Download the installer from nodejs.org

  • Or use command:

# macOS / Linux curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # or wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # load nvm into current shell (choose the one you use) source ~/.bashrc || true source ~/.zshrc || true # install and use LTS Node.js nvm install --lts nvm use --lts # Windows (one of the following) winget install OpenJS.NodeJS.LTS # or with Chocolatey (in admin PowerShell) iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex choco install nodejs-lts -y

Configure your MCP client:

{ "mcpServers": { "dinox-mcp": { "command": "npx", "args": ["-y", "@deepdataspace/dinox-mcp"], "env": { "DINOX_API_KEY": "your-api-key-here", "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory" } } } }

Note: Replace your-api-key-here with your real key.

Option C: Run from source locally

Make sure Node.js is installed (see Option B), then:

# clone git clone https://github.com/IDEA-Research/DINO-X-MCP.git cd DINO-X-MCP # install deps npm install # build npm run build

Configure your MCP client:

{ "mcpServers": { "dinox-mcp": { "command": "node", "args": ["/path/to/DINO-X-MCP/build/index.js"], "env": { "DINOX_API_KEY": "your-api-key-here", "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory" } } } }

CLI Flags & Environment Variables

  • Common flags

    • --http: start in Streamable HTTP mode (otherwise STDIO by default)

    • --stdio: force STDIO mode

    • --dinox-api-key=...: set API key

    • --enable-client-key: allow API key via URL ?key= (Streamable HTTP only)

    • --port=8080: HTTP port (default 3020)

  • Environment variables

    • DINOX_API_KEY (required/conditionally required): DINO-X platform API key

    • IMAGE_STORAGE_DIRECTORY (optional, STDIO): directory to save annotated images

    • AUTH_TOKEN (optional, HTTP): if set, client must send Authorization: Bearer <token>

    Examples:

# STDIO (local) node build/index.js --dinox-api-key=your-api-key # Streamable HTTP (server provides a shared API key) node build/index.js --http --dinox-api-key=your-api-key # Streamable HTTP (custom port) node build/index.js --http --dinox-api-key=your-api-key --port=8080 # Streamable HTTP (require client-provided API key via URL) node build/index.js --http --enable-client-key

Client config when using ?key=:

{ "mcpServers": { "dinox-mcp": { "url": "http://localhost:3020/mcp?key=your-api-key" } } }

Using AUTH_TOKEN with a gateway that injects Authorization: Bearer <token>:

AUTH_TOKEN=my-token node build/index.js --http --enable-client-key

Client example with supergateway:

{ "mcpServers": { "dinox-mcp": { "command": "npx", "args": [ "-y", "supergateway", "--streamableHttp", "http://localhost:3020/mcp?key=your-api-key", "--oauth2Bearer", "my-token" ] } } }

Tools

Capability

Tool ID

Transport

Input

Output

Full-scene object detection

detect-all-objects

STDIO / HTTP

Image URL

Category + bbox + (optional) captions

Text-prompted object detection

detect-objects-by-text

STDIO / HTTP

Image URL + English nouns (dot-separated for multiple, e.g.,

person.car

)

Target object bbox + (optional) captions

Human pose estimation

detect-human-pose-keypoints

STDIO / HTTP

Image URL

17 keypoints + bbox + (optional) captions

Visualization

visualize-detection-result

STDIO only

Image URL + detection results array

Local path to annotated image

🎬 Use Cases

🎯 Scenario

📝 Input

✨ Output

Detection & Localization

💬 Prompt:

Detect and visualize the

fire areas in the forest

🖼️ Input Image:

1-1

1-2

Object Counting

💬 Prompt:

Please analyze this

warehouse image, detect

all the cardboard boxes,

count the total number

🖼️ Input Image:

2-1

Feature Detection

💬 Prompt:

Find all red cars

in the image

🖼️ Input Image:

4-1

4-2

Attribute Reasoning

💬 Prompt:

Find the tallest person

in the image, describe

their clothing

🖼️ Input Image:

5-1

5-2

Full Scene Detection

💬 Prompt:

Find the fruit with

the highest vitamin C

content in the image

🖼️ Input Image:

6-1

6-3

Answer: Kiwi fruit (93mg/100g)

Pose Analysis

💬 Prompt:

Please analyze what

yoga pose this is

🖼️ Input Image:

3-1

3-3

FAQ

  • Supported image sources?

    • STDIO: file:// and https://

    • Streamable HTTP: https:// only

  • Supported image formats?

    • jpg, jpeg, webp, png

Development & Debugging

Use watch mode to auto-rebuild during development:

npm run watch

Use MCP Inspector for debugging:

npm run inspector

License

Apache License 2.0

Deploy Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

Empower LLMs with fine-grained visual understanding — detect, localize, and describe anything in images with natural language prompts.

  1. Why DINO-X MCP?
    1. Transport Modes
      1. Quick Start
        1. 1. Prepare an MCP client
        2. 2. Get your API key
        3. 3. Configure MCP
      2. CLI Flags & Environment Variables
        1. Tools
          1. 🎬 Use Cases
            1. FAQ
              1. Development & Debugging
                1. License

                  Related MCP Servers

                  • -
                    security
                    A
                    license
                    -
                    quality
                    A powerful server that integrates the Moondream vision model to enable advanced image analysis, including captioning, object detection, and visual question answering, through the Model Context Protocol, compatible with AI assistants like Claude and Cline.
                    Last updated -
                    18
                    Apache 2.0
                  • A
                    security
                    F
                    license
                    A
                    quality
                    Enables querying WolframAlpha's LLM API for natural language questions, providing structured and simplified answers optimized for LLM consumption.
                    Last updated -
                    3
                    38
                  • A
                    security
                    A
                    license
                    A
                    quality
                    Enhances LLM capabilities with location-based services and geospatial data, enabling users to geocode addresses, find nearby points of interest, get directions, optimize meeting points, and analyze neighborhoods.
                    Last updated -
                    12
                    113
                    MIT License
                    • Apple
                  • -
                    security
                    F
                    license
                    -
                    quality
                    Intelligently analyzes codebases to enhance LLM prompts with relevant context, featuring adaptive context management and task detection to produce higher quality AI responses.
                    Last updated -

                  View all related MCP servers

                  MCP directory API

                  We provide all the information about MCP servers via our MCP API.

                  curl -X GET 'https://glama.ai/api/mcp/v1/servers/IDEA-Research/DINO-X-MCP'

                  If you have feedback or need assistance with the MCP directory API, please join our Discord server