The Document Organizer MCP Server provides comprehensive document management capabilities, including PDF-to-Markdown conversion, automated organization, and Universal Project Documentation Standard implementation.
PDF-to-Markdown Conversion: Convert PDFs using
marker
(recommended) orpymupdf4llm
engines with configurable options for table preservation, image extraction, memory efficiency, and auto-cleaning.Document Discovery & Management: Recursively scan directories to find PDFs, audit conversion status, and efficiently convert only missing Markdown files through automated workflows.
Content Analysis & Organization: Automatically categorize documents (Research, Planning, Technical, Business, etc.) based on content analysis and organize them into structured directory hierarchies.
Full Automation Pipeline: Execute end-to-end workflows that discover, convert, categorize, and organize all documents from start to finish.
Universal Project Documentation Standard: Initialize standardized project structures with essential files like
CURRENT_STATUS.md
andACTIVE_PLAN.md
, validate compliance, manage plan lifecycles (ACTIVE, ARCHIVED, SUPERSEDED, BLOCKED), and generate automated progress reports and handoff documentation.
Document Organizer MCP Server
A powerful Model Context Protocol (MCP) server for systematic document organization, PDF-to-Markdown conversion, and Universal Project Documentation Standard implementation.
Features
🔄 PDF Conversion Engine
Dual Engine Support: marker (recommended) and pymupdf4llm
Intelligent Table Preservation: Advanced table-aware cleaning
Image Extraction: Optional embedded image extraction
Memory Efficient: Configurable processing for large documents
Auto-Cleaning: Removes marker formatting artifacts automatically
📊 Document Organization
Recursive PDF Discovery: Comprehensive file system scanning
Conversion Status Auditing: Track converted vs unconverted documents
Intelligent Categorization: Keyword-based content analysis
Automated Folder Organization: Category-based directory structures
Full Workflow Automation: End-to-end document processing pipeline
📋 Universal Project Documentation Standard
Standardized Structure: Consistent documentation across all projects
Status-Driven Plans: ACTIVE, ARCHIVED, SUPERSEDED, BLOCKED statuses
Weekly Progress Tracking: Automated handoff documentation
Compliance Validation: Ensure adherence to documentation standards
Template Generation: Project-specific documentation templates
Installation
Dependencies
For PDF conversion functionality, install one or both engines:
Usage
MCP Configuration
Add to your MCP client configuration:
Available Tools
PDF Conversion Tools
convert_pdf
- Convert PDF to Markdown with configurable optionscheck_dependency
- Verify and optionally install conversion engines
Document Organization Tools
document_organizer__discover_pdfs
- Recursively find all PDF filesdocument_organizer__check_conversions
- Audit conversion statusdocument_organizer__convert_missing
- Convert only unconverted PDFsdocument_organizer__analyze_content
- Categorize documents by contentdocument_organizer__organize_structure
- Create organized folder hierarchiesdocument_organizer__full_workflow
- Complete automation pipeline
Documentation Standard Tools
document_organizer__init_project_docs
- Initialize standard documentation structuredocument_organizer__validate_doc_structure
- Validate compliancedocument_organizer__archive_plan
- Archive development plansdocument_organizer__create_weekly_handoff
- Generate progress reports
Examples
Basic PDF Conversion
Full Document Organization Workflow
Initialize Project Documentation
Configuration Options
PDF Conversion Options
Document Categories
Automatic categorization supports:
Research: Analysis, studies, investigations
Planning: Strategies, roadmaps, discussions
Documentation: Guides, manuals, references
Technical: Implementation, architecture, APIs
Business: Market analysis, commercial strategies
General: Uncategorized content
Universal Project Documentation Standard
Required Files
CURRENT_STATUS.md
- Real-time project statusACTIVE_PLAN.md
- Currently executing plan.claude-instructions.md
- AI assistant instructions
Directory Structure
Status Management
ACTIVE: Currently executing plan
ARCHIVED: Historical/completed plan
SUPERSEDED: Replaced by newer plan
BLOCKED: Waiting for external input
Development
Performance Considerations
Memory Efficiency: Use
page_chunks: true
for large PDFsProcessing Speed: marker is slower but higher quality than pymupdf4llm
Batch Processing:
convert_missing
tool optimizes bulk conversionsTable Preservation: marker with auto-cleaning provides best table formatting
Error Handling
The server provides comprehensive error handling:
Dependency validation before operations
Graceful fallback between conversion engines
Detailed error messages with context
Progress tracking for long-running operations
Contributing
Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Ensure all tests pass
Submit a pull request
License
MIT License - see LICENSE file for details.
Support
local-only server
The server can only run on the client's local machine because it depends on local resources.
Tools
Enables systematic document organization with PDF-to-Markdown conversion, intelligent categorization, and automated workflow management. Supports project documentation standards and provides complete end-to-end document processing pipelines.