PDF Reader MCP Server

CLAUDE.md•3.69 kB

# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview This is a Model Context Protocol (MCP) server for PDF content extraction and processing. It's built in TypeScript using the `@modelcontextprotocol/sdk` and provides tools for extracting text, metadata, and validating PDF files. The server uses stdio transport and follows MCP specification 2025-06-18. ## Common Commands ### Development ```bash npm run dev # Run development server with tsx npm run build # Compile TypeScript to dist/ npm start # Run compiled server from dist/ npm run typecheck # Type check without emitting files npm run lint # ESLint on src/**/*.ts npm test # Run Vitest tests ``` ### Testing ```bash npm test # Run all tests with Vitest npm test -- --run # Run tests once without watch mode ``` Tests are located in `src/**/*.test.ts` and use Vitest with Node.js environment. ## Architecture ### Core Structure - **Main Server**: `src/index.ts` - PDFExtractionServer class with stdio transport - **Tools Directory**: `src/tools/` - Individual MCP tool implementations - **Services**: `src/services/` - PDF processing business logic using pdf-parse - **Types**: `src/types/` - TypeScript definitions for PDF and MCP types - **Utils**: `src/utils/` - Validation and error handling utilities - **Config**: `src/config/` - Server configuration with environment variables ### MCP Tools Implementation The server provides 4 main tools: 1. `extract_pdf_text` - Text extraction with formatting options 2. `extract_pdf_metadata` - Document metadata extraction 3. `extract_pdf_pages` - Page-specific content extraction 4. `validate_pdf` - PDF validation and integrity checks ### Key Dependencies - `pdf-parse` - Primary PDF processing library - `pdf-lib` - Additional PDF manipulation - `zod` - Input validation schemas - `winston` - Structured logging - `@modelcontextprotocol/sdk` - MCP framework ### Known Issues - **PDF-Parse Warning**: The `pdf-parse` library may output "Warning: Indexing all PDF objects" to stderr during initialization. This is a harmless informational message from the native PDF parsing library and can be safely ignored. To suppress it when running the server, redirect stderr: `node dist/index.js 2>/dev/null` ### Error Handling Uses standardized MCP error codes: - `-32602`: Validation errors - `-32603`: File access errors - `-32604`: Size/resource errors - `-32605`: Format errors ### Configuration Environment variables in `src/config/server-config.ts`: - `PDF_MAX_FILE_SIZE`: Max file size (default: 100MB) - `PDF_PROCESSING_TIMEOUT`: Processing timeout (default: 60s) - `PDF_MAX_MEMORY_USAGE`: Memory limit (default: 500MB) - `LOG_LEVEL`: Logging level (default: 'info') ## Development Guidelines ### Adding New Tools 1. Create tool definition in `src/tools/[tool-name].ts` 2. Export both tool schema and handler function 3. Add tool to `src/index.ts` tools array and switch statement 4. Update types in `src/types/` if needed ### PDF Processing All PDF operations go through `PDFProcessor` class in `src/services/pdf-processor.ts`. This class handles file validation, timeout management, and metadata extraction consistently across tools. ### Type Safety The project uses strict TypeScript with comprehensive type definitions. All tool parameters are validated using Zod schemas defined in `src/types/mcp-types.ts`. ### Security Considerations - File path validation prevents directory traversal - File size limits enforced before processing - Processing timeouts prevent resource exhaustion - No temporary file persistence

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pablontiv/pdf-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server