Skip to main content
Glama

Deepgram MCP Server

by reddheeraj

Deepgram MCP Server

A Model Context Protocol (MCP) server that provides access to Deepgram's speech recognition and text-to-speech capabilities.

Features

  • Audio Transcription: Convert audio to text with high accuracy
  • Text-to-Speech: Generate natural-sounding speech from text with automatic compression
  • Audio Analysis: Extract insights like sentiment, topics, intents, and entities
  • Speaker Diarization: Identify different speakers in audio
  • Language Detection: Automatically detect the language of audio
  • Multiple Models: Support for various Deepgram models optimized for different use cases
  • Smart Audio Compression: Automatically compresses generated audio files for efficient transfer

Installation

  1. Clone this repository
  2. Install dependencies:
    npm install
  3. Copy the environment file and add your Deepgram API key:
    cp env.example .env # Edit .env and add your DEEPGRAM_API_KEY, OPENAI_API_KEY or GROQ_API_KEY (whatever you want to use)
  4. Build the project:
    npm run build

Usage

npm start # or node dist/index.js

The server will start on port 8080 by default. You can specify a different port:

node dist/index.js --port 8081

STDIO Transport (For Development)

npm run start:stdio # or node dist/index.js --stdio --port 8081

Available Tools

1. transcribe_audio

Transcribe audio to text with various options for customization.

Parameters:

  • audioUrl or audioData: Audio source (URL or base64)
  • model: Deepgram model to use (default: "nova-2-general")
  • language: Language code (default: "en")
  • punctuate: Add punctuation (default: true)
  • diarize: Speaker identification (default: false)
  • sentiment: Sentiment analysis (default: false)
  • And many more options...

2. text_to_speech

Convert text to speech using Deepgram's TTS models with automatic compression.

Parameters:

  • text: Text to convert to speech (required)
  • model: TTS model to use (default: "aura-asteria-en")
  • voice: Voice selection
  • format: Output format (default: "mp3")
  • speed: Speech speed (default: 1.0)

Output:

  • Original audio file saved to generated_audio/ folder
  • Compressed audio data saved to compressed_audio/ folder
  • Response includes file paths and compression metadata

3. analyze_audio

Perform advanced audio analysis including sentiment, topics, intents, and entities.

Parameters:

  • audioUrl or audioData: Audio source
  • features: Analysis features to enable
  • model: Model for analysis

4. get_models

Get information about available Deepgram models.

Parameters:

  • model_type: Filter by model type ("transcription", "tts", or "all")

Client Configuration

For MCP clients, use this configuration:

{ "mcpServers": { "deepgram": { "url": "http://localhost:8080/mcp" } } }

Development

# Watch mode for development npm run watch # Development with STDIO npm run dev:stdio # Development with HTTP npm run dev

API Key

Get your Deepgram API key from Deepgram Console.

Audio Compression System

The TTS functionality includes an intelligent compression system that:

  • Automatically compresses generated audio files using gzip compression
  • Saves compressed data to separate files to avoid large agent responses
  • Provides decompression tools for easy audio file extraction
  • Maintains quality while reducing file sizes by 2-4x

File Structure

generated_audio/ # Original audio files ├── tts_2025-01-16T...mp3 compressed_audio/ # Compressed audio data ├── compressed_audio_2025-01-16T...json decompressed_audio/ # Decompressed audio files (after extraction) ├── decompressed_2025-01-16T...mp3

Decompression Tools

Python Script (Recommended):

python decompress_audio.py <response_file_or_compressed_file>

Node.js Script:

npm run decompress <compressed_data_file>

Agno Integration

This MCP server also includes integration with Agno, a high-performance runtime for multi-agent systems.

Agno Tests

# Text-to-Speech test (saves audio to generated_audio/ and compressed_audio/) npm run test:agno:tts # Speech-to-Text test (transcribes sample audio) npm run test:agno:stt

The TTS test will:

  1. Generate audio with automatic compression
  2. Save the response to tts_response.json
  3. Decompress the audio file to generated_audio/

License

MIT

Developer

-
security - not tested
A
license - permissive license
-
quality - not tested

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Enables speech-to-text transcription, text-to-speech synthesis, and audio analysis using Deepgram's AI models. Supports features like speaker diarization, sentiment analysis, language detection, and various audio processing capabilities.

  1. Features
    1. Installation
      1. Usage
        1. HTTP Transport (Recommended for Production)
        2. STDIO Transport (For Development)
      2. Available Tools
        1. 1. transcribe_audio
        2. 2. text_to_speech
        3. 3. analyze_audio
        4. 4. get_models
      3. Client Configuration
        1. Development
          1. API Key
            1. Audio Compression System
              1. File Structure
              2. Decompression Tools
            2. Agno Integration
              1. Agno Tests
            3. License
              1. Developer

                MCP directory API

                We provide all the information about MCP servers via our MCP API.

                curl -X GET 'https://glama.ai/api/mcp/v1/servers/reddheeraj/Deepgram-MCP'

                If you have feedback or need assistance with the MCP directory API, please join our Discord server