语音识别MCP服务

该服务通过 stdio 和 MCP 模式提供语音识别和文本提取功能。

特征

文件中的语音识别
通过 base64 编码数据进行语音识别
文本提取
支持 stdio 和 MCP 模式
结构化语音识别结果

项目结构

voice_service.py - 核心服务实现
stdio_server.py - stdio 模式入口点
mcp_server.py - MCP 模式入口点
build.py可执行文件的构建脚本
build_exec.sh - 构建执行脚本
test_*.sh不同功能的测试脚本

安装

克隆存储库：

git clone https://github.com/AIO-2030/mcp_voice_identify.git
cd mcp_voice_identify

安装依赖项：

pip install -r requirements.txt

在.env中设置环境变量：

API_URL=your_api_url
API_KEY=your_api_key

用法

stdio模式

运行服务：

python stdio_server.py

通过 stdin 发送 JSON-RPC 请求：

{
    "jsonrpc": "2.0",
    "method": "help",
    "params": {},
    "id": 1
}

或者使用可执行文件：

./dist/voice_stdio

MCP 模式

运行服务：

python mcp_server.py

或者使用可执行文件：

./dist/voice_mcp

语音识别结果

该服务提供结构化的语音识别结果。以下是响应格式的示例：

原始 API 响应

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": "<|en|><|EMO_UNKNOWN|><|Speech|><|woitn|>test test test"
    },
    "id": 1
}

重组响应

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": {
            "lan": "en",
            "emo": "unknown",
            "type": "speech",
            "speaker": "woitn",
            "text": "test test test"
        }
    },
    "id": 1
}

标签结果字段

label_result字段包含以下结构化信息：

场地	描述	示例值
局域网	语言代码	“en”
情绪摇滚	情绪状态	“未知”
类型	音频类型	“演讲”
扬声器	说话人识别器	“沃因”
文本	识别的文本内容	“测试测试测试”

特殊标签

该服务识别并处理原始响应中的以下特殊标签：

<|en|> - 语言代码
<|EMO_UNKNOWN|> - 情绪状态
<|Speech|> - 音频类型
<|woitn|> - 说话人标识符

构建可执行文件

使构建脚本可执行：

chmod +x build_exec.sh

构建 stdio 模式可执行文件：

./build_exec.sh

构建 MCP 模式可执行文件：

./build_exec.sh mcp

可执行文件将在以下位置创建：

stdio 模式： dist/voice_stdio
MCP 模式： dist/voice_mcp

测试

运行测试脚本：

chmod +x test_*.sh
./test_help.sh
./test_voice_file.sh
./test_voice_base64.sh

执照

该项目根据 MIT 许可证获得许可 - 有关详细信息，请参阅 LICENSE 文件。

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

提供语音识别和文本提取功能，支持 stdio 和 MCP 模式，处理音频文件或 base64 编码数据并返回包含语言、情感和说话者信息的结构化结果。

Related MCP Servers

Kokoro TTS MCP Server
giannisanni
-
security
F
license
-
quality
Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
Last updated -
7
Python
Analytical MCP Server
quanticsoul4772
-
security
A
license
-
quality
Provides advanced analytical, research, and natural language processing capabilities through a Model Context Protocol server, enabling dataset analysis, decision analysis, and enhanced NLP features like entity recognition and fact extraction.
Last updated -
2
TypeScript
MIT License
Resemble AI Voice Generation MCP Server
obaid
-
security
F
license
-
quality
Integrates with Claude and Cursor using the Model Context Protocol to generate voice audio from text using Resemble AI's voices.
Last updated -
Python
VOICEVOX MCP Server
Yuki10Kobayashi
A
security
A
license
A
quality
A Model Context Protocol server that integrates with VOICEVOX engine to provide text-to-speech synthesis and speaker information retrieval, allowing users to generate and play voice audio from text.
Last updated -
2
TypeScript
MIT License

View all related MCP servers

Appeared in Searches

A service to convert text to ready-to-use audio with download, player, or embed options

Voice Recognition MCP Service

语音识别MCP服务

特征

项目结构

安装

用法

stdio模式

MCP 模式

语音识别结果

原始 API 响应

重组响应

标签结果字段

特殊标签

构建可执行文件

测试

执照

Related MCP Servers

Kokoro TTS MCP Server

Analytical MCP Server

Resemble AI Voice Generation MCP Server

VOICEVOX MCP Server

Appeared in Searches

New MCP Servers

MCP directory API