PDFtotext MCP Server
A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext
utility from poppler-utils.
🚀 Why This Server?
Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp
is:
- ✅ Actually works - Clean JSON-RPC communication without stdout pollution
- ✅ Reliable - Built on mature
pdftotext
from poppler-utils (used by millions) - ✅ Lightweight - Minimal dependencies, maximum compatibility
- ✅ Production tested - Successfully tested with Claude Desktop and other MCP clients
- ✅ Feature complete - Page-specific extraction, layout preservation, encoding options
- ✅ Error handling - Comprehensive validation and helpful error messages
📋 Features
- 📄 Extract text from entire PDF documents or specific pages
- 🎨 Preserve original layout formatting (optional)
- 🔤 Multiple text encoding support (UTF-8, Latin1, ASCII)
- 📊 Comprehensive metadata in responses (word count, file info, etc.)
- 🛡️ File validation and security checks
- ⚡ Fast processing with configurable timeouts
- 🔍 Detailed error reporting with troubleshooting hints
🔧 Prerequisites
You must have pdftotext
installed on your system:
Ubuntu/Debian
macOS
Windows
Verify Installation
📦 Installation
Option 1: Global Installation (Recommended)
Option 2: Use with npx (No Installation)
Option 3: Local Development
⚙️ Configuration
Add to your MCP client configuration:
Claude Desktop
Add to claude_desktop_config.json
:
Or with npx:
Other MCP Clients
The server works with any MCP-compatible client. Use pdftotext-mcp
as the command.
🎯 Usage
The server provides a single, powerful tool: read_pdf_text
Basic Usage
Extract entire document
Extract specific page
Preserve layout formatting
Custom encoding
Response Format
Success Response
Error Response
📚 API Reference
Tool: read_pdf_text
Extracts text content from PDF files using pdftotext.
Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
path | string | ✅ | - | Path to PDF file (relative or absolute) |
page | number | ❌ | all pages | Specific page to extract (1-based) |
layout | boolean | ❌ | false | Preserve original text layout |
encoding | string | ❌ | "UTF-8" | Output text encoding |
Supported Encodings
UTF-8
(default)Latin1
ASCII
Error Types
FILE_NOT_FOUND
- PDF file doesn't existPERMISSION_DENIED
- Cannot read the fileINVALID_PDF
- File is not a valid PDFPDFTOTEXT_ERROR
- pdftotext utility errorUNKNOWN_ERROR
- Unexpected error
🔧 Troubleshooting
"pdftotext is not available"
Solution: Install poppler-utils (see Prerequisites)
"File not found"
Solutions:
- Use absolute paths:
/home/user/document.pdf
- Check file exists:
ls -la /path/to/file.pdf
- Verify MCP server working directory
"Permission denied"
Solutions:
- Check file permissions:
chmod 644 document.pdf
- Ensure directory is readable:
chmod 755 /path/to/directory/
"File is not a valid PDF"
Solutions:
- Verify file is actually a PDF:
file document.pdf
- Check for file corruption
- Try with a different PDF file
MCP Connection Issues
Solutions:
- Restart your MCP client completely
- Check configuration syntax in config file
- Verify
pdftotext-mcp
is accessible in PATH - Check MCP client logs for detailed errors
🧪 Testing
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
Running Locally
Code Style
This project uses ESLint. Run npm run lint
to check code style.
📄 License
MIT - see LICENSE file for details.
🙏 Acknowledgments
- Built for the Model Context Protocol ecosystem
- Uses poppler-utils
pdftotext
utility - Inspired by the need for reliable PDF processing in MCP environments
🔗 Related
Made for the MCP community
This server cannot be installed
A reliable server for extracting text from PDF documents using the poppler-utils' pdftotext utility, compatible with any Model Context Protocol client.
Related MCP Servers
- -securityFlicense-qualityProvides tools for reading and extracting text from PDF files, supporting both local files and URLs.Last updated -23Python
- AsecurityFlicenseAqualityA Model Context Protocol server that converts PDF documents into PNG images through a simple MCP tool call.Last updated -15Python
- AsecurityFlicenseAqualityAn MCP server that provides a tool to extract text content from local PDF files, supporting both standard PDF reading and OCR capabilities with optional page selection.Last updated -117Python
- -securityFlicense-qualityA PDF processing server that extracts text via normal parsing or OCR, and retrieves images from PDF files through the MCP protocol with a built-in web debugger.Last updated -24Python