Skip to main content
Glama

PDF Reader MCP Server

by sylphxltd
systemPatterns.md5.05 kB
# System Patterns: PDF Reader MCP Server ## 1. Architecture Overview The PDF Reader MCP server is a standalone Node.js application based on the original Filesystem MCP. It's designed to run as a child process, communicating with its parent (the AI agent host) via standard input/output (stdio) using the Model Context Protocol (MCP) to provide PDF reading capabilities. ```mermaid graph LR A[Agent Host Environment] -- MCP over Stdio --> B(PDF Reader MCP Server); B -- Node.js fs/path/pdfjs-dist --> C[User Filesystem (Project Root)]; C -- Results/Data --> B; B -- MCP over Stdio --> A; ``` ## 2. Key Technical Decisions & Patterns - **MCP SDK Usage:** Leverages the `@modelcontextprotocol/sdk` for handling MCP communication (request parsing, response formatting, error handling). This standardizes interaction and reduces boilerplate code. - **Stdio Transport:** Uses `StdioServerTransport` from the SDK for communication, suitable for running as a managed child process. - **Asynchronous Operations:** All filesystem interactions and request handling are implemented using `async/await` and Node.js's promise-based `fs` module (`fs.promises`) for non-blocking I/O. - **Strict Path Resolution:** A dedicated `resolvePath` function is used for _every_ path received from the agent. - It normalizes the path. - It resolves the path relative to the server process's current working directory (`process.cwd()`), which is treated as the `PROJECT_ROOT`. **Crucially, this requires the process launching the server (e.g., the agent host) to set the correct `cwd` for the target project.** - It explicitly checks if the resolved absolute path still starts with the `PROJECT_ROOT` absolute path to prevent path traversal vulnerabilities (e.g., `../../sensitive-file`). - It rejects absolute paths provided by the agent. - **Zod for Schemas & Validation:** Uses `zod` library to define input schemas for tools and perform robust validation within each handler. JSON schemas for MCP listing are generated from Zod schemas. - **Tool Definition Aggregation:** Tool definitions (name, description, Zod schema, handler function) are defined in their respective handler files and aggregated in `src/handlers/index.ts` for registration in `src/index.ts`. - **`edit_file` Logic:** - Processes multiple changes per file, applying them sequentially from bottom-to-top to minimize line number conflicts. - Handles insertion, text replacement, and deletion. - Implements basic indentation detection (`detect-indent`) and preservation for insertions/replacements. - Uses `diff` library to generate unified diff output. - **Error Handling:** - Uses `try...catch` blocks within each tool handler. - Catches specific Node.js filesystem errors (like `ENOENT`, `EPERM`, `EACCES`) and maps them to appropriate MCP error codes (`InvalidRequest`). - Uses custom `McpError` objects for standardized error reporting back to the agent. - Logs unexpected errors to the server's console (`stderr`) for debugging. - **Glob for Listing/Searching:** Uses the `glob` library for flexible and powerful file listing and searching based on glob patterns, including recursive operations and stat retrieval. Careful handling of `glob`'s different output types based on options (`string[]`, `Path[]`, `Path[]` with `stats`) is implemented. - **TypeScript:** Provides static typing for better code maintainability, early error detection, and improved developer experience. Uses ES module syntax (`import`/`export`). - **PDF Parsing:** Uses Mozilla's `pdfjs-dist` library to load PDF documents and extract text content, metadata, and page information. The `read_pdf` handler uses its API. ## 3. Component Relationships - **`index.ts`:** Main entry point. Sets up the MCP server instance, defines tool schemas, registers request handlers, and starts the server connection. - **`Server` (from SDK):** Core MCP server class handling protocol logic. - **`StdioServerTransport` (from SDK):** Handles reading/writing MCP messages via stdio. - **Tool Handler Function (`handleReadPdfFunc`):** Contains the logic for the consolidated `read_pdf` tool, including Zod argument validation, path resolution, PDF loading/parsing via `pdfjs-dist`, and result formatting based on input parameters. - **`resolvePath` Helper:** Centralized security function for path validation. - **`formatStats` Helper:** Utility to create a consistent stats object structure. - **Node.js Modules (`fs`, `path`):** Used for actual filesystem operations and path manipulation. - **`glob` Library:** Used for pattern-based file searching and listing. - **`zod` Library:** Used for defining and validating tool input schemas. - **`diff` Library:** (Inherited, but not used by PDF tools) Used by `edit_file`. - **`detect-indent` Library:** (Inherited, but not used by PDF tools) Used by `edit_file`. - **`pdfjs-dist` Library:** Used by the `read_pdf` handler to load and process PDF documents.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sylphxltd/pdf-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server