Allows AI assistants to visually interact with macOS applications by capturing screenshots, controlling mouse and keyboard inputs, managing windows, extracting text via OCR, and detecting errors on screen.
Mentioned as an example application that can be controlled, allowing the AI to focus Safari windows and interact with web content.
Mac Commander MCP Server
🤖 Enable AI assistants to visually interact with your macOS applications
An MCP (Model Context Protocol) server that allows AI coding tools like Claude Desktop, Claude Code, and Cursor to see, control, and test macOS applications. Perfect for automated testing, UI debugging, and error detection.
🎆 What makes this special?
- ✨ Visual AI: Your AI can actually see what's on your screen
- 🎯 Smart UI Detection: Advanced element detection finds buttons, forms, and controls without relying on text
- 🗾 Error Detection: Automatically finds bugs and error dialogs
- 🔄 Full Control: Click, type, and navigate just like a human
- 📱 App Testing: Perfect for testing mobile apps, desktop software, or web interfaces
- ⚡ High Performance: Optimized memory usage and 60-80% faster text operations
- 🚀 Easy Setup: Get started in under 5 minutes
🚀 Quick Start
Option 1: Automated Install (Easiest)
The installer will:
- ✅ Check your Node.js version
- ✅ Install dependencies and build the project
- ✅ Show you the exact configuration to copy
- ✅ Offer to open System Settings for permissions
Option 2: Manual Install
✨ In 2 minutes, your AI will be able to see and control your Mac!
📚 Table of Contents
- 📖 Complete Documentation Index - Start here for all documentation
- ✨ Features
- 🛠️ Prerequisites
- 📦 Installation
- ⚙️ Configuration
- 📊 Implementation Status & Available Tools
- 🚀 Usage Examples
- 📖 Tool Parameter Reference
- 🚀 Performance Improvements
- ⚠️ Troubleshooting
- 🔒 Security
- 📚 Additional Resources
✨ Features
Core Features
- 📸 Screenshot Capture: High-performance screenshots with optional base64 compression and metadata-only responses
- 🖱️ Mouse Control: Click, double-click, and move the mouse cursor
- ⌨️ Keyboard Input: Type text and press key combinations
- 🪟 Window Management: List, find, focus, and get information about application windows
- 🔍 OCR Text Recognition: Extract and find text on screen using Tesseract.js
- ⚠️ Error Detection: Automatically detect error dialogs and messages using OCR
- 📏 Screen Information: Get display dimensions and coordinates
Advanced UI Element Detection (New!)
- 🎯 Multi-Strategy Detection Engine: Combines visual analysis, OCR, color patterns, and shape detection
- 🔍 20+ UI Element Types: Detects buttons, text fields, links, dialogs, menus, checkboxes, dropdowns, tabs, toolbars, scrollbars, and more
- 🍎 macOS-Specific Patterns: Optimized for Apple Human Interface Guidelines and native UI components
- 📊 Confidence Scoring: Each detected element includes reliability scores and validation methods
- 🔄 Element Classification: Advanced categorization with state detection (enabled/disabled/selected/focused)
- 🎨 Visual Feature Analysis: Color, shape, border radius, and spatial relationship analysis for accurate detection
- 🧠 Context-Aware Detection: Groups related elements and understands form patterns, dialog layouts, and menu structures
- ✅ Interactive Element Validation: Verifies clickability and interactivity through visual characteristics
Advanced Automation Features (New!)
- 🎯 Drag & Drop: Drag operations for moving UI elements and files
- 📜 Advanced Scrolling: Directional scrolling with customizable amounts
- 🖱️ Mouse Gestures: Hover, right-click, and mouse movement controls
- ⌨️ Keyboard Input: Text typing with configurable delays between keystrokes
- 🔄 Complex Interactions: Chain multiple actions for sophisticated automation
- ⏱️ Precise Timing: Built-in wait functionality and timing controls
Performance Features (New!)
- ⚡ Optimized Memory Usage: Reduced memory consumption from 99% to ~60-70% through intelligent buffering
- 🚀 Fast Text Search: 60-80% faster
find_text
operations with optimized OCR processing - 💾 Smart Caching System: Intelligent cache with 30-70% hit rates for frequently accessed screenshots
- 🖼️ Chunked Image Processing: Efficient handling of large images through intelligent chunking
- 🎛️ Automatic Memory Management: Built-in throttling and cleanup to prevent memory exhaustion
- 📊 Performance Monitoring: Real-time tracking of memory usage, cache performance, and operation timings
- 🔄 Request Batching: Optimized handling of multiple simultaneous operations
🛠️ Prerequisites
System Requirements
- macOS 13+ (Ventura or later)
- Node.js 18+ and npm
- AI client with MCP support:
- Claude Desktop (recommended)
- Claude Code
- Cursor with MCP support
- Any other MCP-compatible client
Required macOS Permissions
⚠️ Important: You must grant these permissions or the server won't work!
- Screen Recording Permission:
- Go to System Settings → Privacy & Security → Screen Recording
- Click the + button and add your AI client (Claude Desktop, Cursor, etc.)
- ✅ Check the box next to your AI client
- Accessibility Permission:
- Go to System Settings → Privacy & Security → Accessibility
- Click the + button and add your AI client
- ✅ Check the box next to your AI client
💡 Tip: You might need to restart your AI client after granting permissions.
📦 Installation
💿 Automated Installation
Recommended for beginners:
The installer script will guide you through everything!
🔧 Manual Installation
For advanced users:
Option 2: Global Install
🔧 Verify Installation
Run the test script to make sure everything works:
You should see the server start and respond to test commands.
⚙️ Configuration
🖥️ Claude Desktop Setup
- Open Claude Desktop and go to Settings (gear icon)
- Click on the Developer tab
- Click Edit Config to open the configuration file
- Add the MCP server configuration:
🚨 Important: Replace
/FULL/PATH/TO/
with the actual absolute path to where you cloned this repository!
Example with real path:
- Save the file and restart Claude Desktop
- Start a new chat - you should see a 🔨 hammer icon indicating MCP is active
💻 Claude Code Setup
- Navigate to your project folder in terminal
- Create or edit
.claude/config.json
in your project root:
- Start Claude Code in that project folder:
🎯 Cursor Setup
- Open Cursor and go to Settings → Cursor Settings → MCP
- Click "Add new global MCP server"
- Add the configuration:
- Name:
mac-commander
- Command:
node
- Args:
/FULL/PATH/TO/mac-commander/build/index.js
- Name:
Or create ~/.cursor/mcp.json
:
🔍 Finding Your Full Path
Not sure what your full path is? Run this in the project directory:
Example output: /Users/yourname/Developer/mac-commander/build/index.js
Copy this exact path and use it in your configuration files above.
✅ Verify It's Working
After configuration:
- Restart your AI client (Claude Desktop, Cursor, etc.)
- Start a new chat/session
- Look for the MCP indicator (hammer icon in Claude Desktop)
- Try a test command: "Take a screenshot of my screen"
If it works, you'll see the AI successfully take a screenshot! 🎉
📖 Tool Parameter Reference
screenshot
Capture a screenshot of the screen or a specific region with optimized performance.
Parameters:
outputPath
(optional): Path to save the screenshot as PNGregion
(optional): Object withx
,y
,width
,height
to capture specific areareturnBase64
(optional): Return base64 data in response (default: false)compressionQuality
(optional): JPEG compression quality 10-100 for base64 responses (default: 80)
Performance Features:
- Default mode returns metadata only (fast, small responses)
- Base64 mode includes compressed image data when
returnBase64: true
- 60-80% size reduction through JPEG compression
- Always saves to temp folder for later access regardless of mode
Usage Examples:
Metadata-only mode (recommended for performance):
With base64 data for immediate processing:
Region capture with high compression:
click
Click at specific coordinates on the screen.
Parameters:
x
: X coordinatey
: Y coordinatebutton
: "left", "right", or "middle" (default: "left")doubleClick
: boolean (default: false)verify
: boolean (default: false) - Take a screenshot after clicking to verify the action
type_text
Type text using the keyboard.
Parameters:
text
: Text to typedelay
: Delay between keystrokes in milliseconds (default: 50)
mouse_move
Move the mouse to specific coordinates.
Parameters:
x
: X coordinatey
: Y coordinate
key_press
Press a key or key combination.
Parameters:
key
: Key to press (e.g., "Enter", "Escape", "cmd+a")
check_for_errors
Check the screen for common error indicators.
Parameters:
region
(optional): Specific region to check
wait
Wait for a specified amount of time.
Parameters:
milliseconds
: Time to wait
wait_for_element
Wait for specific text or UI element to appear on screen before continuing. Essential for handling dynamic content, loading screens, and asynchronous UI updates.
Parameters:
text
: Text to wait for on screentimeout
: Maximum wait time in milliseconds (default: 10000)pollInterval
: How often to check in milliseconds (default: 500)region
(optional): Specific region to search in withx
,y
,width
,height
coordinates
Returns success/failure status and location of found element if successful. Perfect for waiting for buttons to become available, dialogs to appear, or loading indicators to disappear.
get_screen_info
Get information about the screen dimensions.
No parameters required.
list_windows
List all open windows with their titles and positions.
No parameters required.
get_active_window
Get information about the currently active window.
No parameters required.
find_window
Find a window by its title (partial match supported).
Parameters:
title
: Window title to search for
focus_window
Focus/activate a window by its title.
Parameters:
title
: Window title to focus
get_window_info
Get detailed information about a specific window.
Parameters:
title
: Window title to get info for
extract_text
Extract and read text from the screen or specific regions using advanced Optical Character Recognition (OCR). Features improved caching system for better performance, confidence scoring, and enhanced text recognition accuracy. Supports fuzzy text matching and configurable OCR settings for optimal results.
Parameters:
region
(optional): Specific region to extract text from withx
,y
,width
,height
coordinates
Enhanced Features:
- Smart Caching: Multi-level caching system with image hash-based keys for better performance
- Confidence Filtering: Configurable minimum confidence thresholds (default: 50%)
- Optimized Processing: Uses worker pool for concurrent OCR operations
- Error Handling: Comprehensive error detection and recovery
- Performance Tracking: Built-in timing and performance metrics
find_text
Locate specific text on the screen using advanced OCR with fuzzy matching capabilities. Returns precise coordinates, confidence scores, and handles OCR variations automatically. Essential for robust UI automation that adapts to text rendering differences.
Parameters:
text
: Text to search for (supports fuzzy matching for OCR variations)region
(optional): Specific region to search in withx
,y
,width
,height
coordinates
Enhanced Features:
- Fuzzy Text Matching: Handles OCR variations with configurable similarity thresholds
- Standard threshold: 70% similarity (configurable)
- Relaxed threshold: 50% similarity for difficult text
- Levenshtein distance algorithm for accurate matching
- Smart Sorting: Results sorted by similarity score and confidence level
- Multiple Match Support: Returns all matching text locations with coordinates
- Center Point Calculation: Provides precise click coordinates for each match
- Confidence Scoring: Each match includes OCR confidence level
- Performance Optimized: Cached results and memory management
Example Fuzzy Matching:
- Search for "Submit" → Finds "Subm1t", "SUBMIT", "submit" (OCR variations)
- Search for "Login" → Matches "Log1n", "LOGIN", "Iog in" (common OCR errors)
- Search for "Cancel" → Finds "Cancei", "CANCEL", "cancel" (character misrecognition)
find_ui_elements
Advanced UI element detection system that intelligently identifies interactive components using multiple detection strategies. Unlike text-only detection, this tool can accurately find buttons, text fields, dropdowns, and other UI elements even when they don't contain visible text. Perfect for modern applications with visual-only buttons, icons, and complex layouts.
Parameters:
autoSave
(optional): Whether to save the screenshot for analysis (default: true)elementTypes
(optional): Array of specific element types to detect:['button', 'text_field', 'link', 'image', 'icon', 'dialog', 'menu', 'window', 'checkbox', 'radio_button', 'dropdown', 'slider', 'tab', 'toolbar', 'list', 'table', 'scrollbar', 'other']
region
(optional): Specific region to analyze withx
,y
,width
,height
coordinates
Detection Strategies:
- Visual Analysis: Detects UI elements based on shape, color, and visual patterns
- OCR Text Recognition: Identifies elements with text content and labels
- Color Pattern Analysis: Recognizes macOS system colors and UI themes
- Shape Detection: Finds rectangular buttons, rounded elements, and geometric patterns
- Context Analysis: Groups related elements and understands spatial relationships
macOS-Specific Features:
- Apple HIG Compliance: Optimized for Apple Human Interface Guidelines
- System Color Recognition: Detects standard macOS button colors (#007AFF, #34C759, #FF3B30, etc.)
- Touch Target Validation: Ensures elements meet minimum 44x44 pixel requirements
- Native UI Patterns: Recognizes standard macOS dialogs, menus, and controls
Output Format: Returns comprehensive element information including:
- Element type and subtype classification
- Precise coordinates and clickable center points
- Confidence scores and detection methods used
- Visual features (colors, border radius, shadows)
- Interactive validation results
- Element state (enabled/disabled/selected)
- Contextual relationships with nearby elements
Example Use Cases:
- Find all clickable buttons in a dialog:
elementTypes: ['button']
- Detect form elements:
elementTypes: ['text_field', 'button', 'dropdown']
- Locate menu items:
elementTypes: ['menu', 'link']
- Find all interactive elements: (no elementTypes filter)
drag
Drag from one point to another using mouse button hold.
Parameters:
startX
: Starting X coordinatestartY
: Starting Y coordinateendX
: Ending X coordinateendY
: Ending Y coordinatebutton
: Mouse button to use for dragging (default: "left")
scroll
Scroll in any direction within the current window or a specific region.
Parameters:
direction
: Direction to scroll ("up", "down", "left", "right")amount
: Number of scroll units (default: 5)x
(optional): X coordinate to scroll at (defaults to current mouse position)y
(optional): Y coordinate to scroll at (defaults to current mouse position)
hover
Hover the mouse at a specific position for a duration.
Parameters:
x
: X coordinate to hover aty
: Y coordinate to hover atduration
: Duration to hover in milliseconds (default: 1000)
right_click
Right-click at specific coordinates to open context menus.
Parameters:
x
: X coordinate to right-clicky
: Y coordinate to right-click
list_screenshots
List all screenshots saved in the temporary folder.
No parameters required.
list_recent_screenshots
List recently captured screenshots with detailed metadata including timestamps, file sizes, and dimensions.
Parameters:
limit
: Maximum number of screenshots to list (default: 10, max: 50)
view_screenshot
View/display a specific screenshot from the temporary folder.
Parameters:
filename
: Name of the screenshot file to view
cleanup_screenshots
Clean up old screenshots from temporary folder, keeping only recent ones.
Parameters:
keepLast
: Number of recent screenshots to keep (default: 10)
compare_screenshots
Compare two previously saved screenshots to identify differences and changes.
Parameters:
screenshot1
: Filename of the first screenshotscreenshot2
: Filename of the second screenshot
describe_screenshot
Capture and analyze a screenshot with AI-powered insights, combining OCR text extraction and UI element detection.
Parameters:
region
(optional): Specific region to analyze withx
,y
,width
,height
coordinatessavePath
(optional): Optional path to save the analyzed screenshot
performance_dashboard
Comprehensive performance monitoring dashboard providing real-time system health, metrics, and optimization recommendations.
Parameters:
includeMetrics
(optional): Include detailed metrics in response (default: true)includeRecommendations
(optional): Include optimization recommendations (default: true)includeHistory
(optional): Include performance history and trends (default: false)timeRangeMs
(optional): Time range for trends in milliseconds (default: 1 hour)
🔧 OCR Configuration Options
The OCR system can be customized with various configuration options to optimize performance and accuracy for different use cases:
configureOCR(options)
Configure OCR settings globally for all text recognition operations.
Available Options:
minConfidence
: Minimum confidence score for text recognition (default: 50, range: 0-100)fuzzyMatchThreshold
: Standard similarity threshold for fuzzy matching (default: 0.7, range: 0-1)relaxedFuzzyThreshold
: Fallback threshold for difficult text (default: 0.5, range: 0-1)cacheEnabled
: Enable/disable OCR result caching (default: true)cacheTTL
: Cache time-to-live in milliseconds (default: 30000)maxCacheSize
: Maximum number of cached results (default: 100)timeoutMs
: OCR operation timeout in milliseconds (default: 30000)
Example Configuration:
OCR Performance Features
Worker Pool Architecture:
- Concurrent OCR processing with multiple worker threads
- Automatic load balancing and task prioritization
- Graceful fallback to single worker if pool initialization fails
Intelligent Caching:
- Multi-level caching with image hash and region-based keys
- Automatic cache cleanup and size management
- Configurable TTL and cache size limits
Memory Management:
- Automatic garbage collection triggers for large OCR operations
- Memory usage monitoring and cleanup
- Efficient image processing and buffer management
Error Handling:
- Comprehensive error detection and recovery
- Timeout protection for long-running OCR operations
- Detailed error reporting with context information
📊 Implementation Status & Available Tools
✅ Fully Implemented Tools
Screenshot Management:
screenshot
- Screen capture with region support and compressionlist_screenshots
- List all saved screenshotslist_recent_screenshots
- List recent screenshots with metadataview_screenshot
- View specific screenshot filescleanup_screenshots
- Clean up old screenshot filescompare_screenshots
- Compare two screenshotsdescribe_screenshot
- AI-powered screenshot analysis
Mouse & Keyboard Control:
click
- Click with multiple button support and verificationtype_text
- Text input with configurable delayskey_press
- Key combinations and shortcutsmouse_move
- Move mouse cursordrag
- Drag and drop operationsscroll
- Directional scrollinghover
- Mouse hover with durationright_click
- Context menu access
Window Management:
list_windows
- List all open windowsget_active_window
- Get current window infofind_window
- Find window by titlefocus_window
- Bring window to frontget_window_info
- Detailed window information
OCR & Text Recognition:
extract_text
- OCR text extraction with cachingfind_text
- Locate text on screen with fuzzy matchingwait_for_element
- Wait for text/elements to appearfind_ui_elements
- Advanced visual UI element detection
System & Utilities:
get_screen_info
- Screen dimensionscheck_for_errors
- Visual error detectionwait
- Pause executiondiagnostic
- System health checkperformance_dashboard
- Performance monitoring
🚧 Planned Features (Not Yet Implemented)
The following features mentioned in examples are planned for future releases:
click_hold
- Click and hold operationsrelative_mouse_move
- Relative mouse positioningkey_hold
- Hold keys for durationtype_with_delay
- Human-like typing with variable delays and typos- Advanced smooth scrolling with easing
- Pixel-perfect scrolling controls
🚀 Usage Examples
🎯 Basic Commands
Once configured, you can ask your AI assistant to:
Screenshots & Visual Inspection:
- "Take a screenshot of my app" (metadata-only for fast responses)
- "Capture just the top-left corner of the screen"
- "Save a screenshot to ~/Desktop/app-screenshot.png"
- "Take a screenshot and return the base64 data with 70% compression"
- "Capture a region and return compressed image data for processing"
Mouse & Keyboard Control:
- "Click the button at coordinates 100, 200"
- "Double-click on the center of the screen"
- "Type 'Hello World' in the current field"
- "Press cmd+s to save the file"
- "Press Enter to submit"
Window Management:
- "List all open windows"
- "Focus the Safari window"
- "Get information about the active window"
- "Find the window with 'Calculator' in the title"
Text Recognition & Search:
- "Extract all text from the screen"
- "Find the 'Submit' button on screen"
- "Look for any text containing 'error' on screen"
- "Read the text in the dialog box"
UI Element Detection:
- "Find all clickable buttons on this screen"
- "Detect text fields and form elements in this dialog"
- "Locate all interactive elements (buttons, links, dropdowns)"
- "Identify the toolbar and menu elements visually"
- "Find UI elements by type: buttons, text fields, and checkboxes"
Error Detection:
- "Check if there are any error dialogs on screen"
- "Look for error messages in my app"
- "Scan for any warning or error indicators"
🔧 Advanced Automation Examples
UI Testing Workflow:
Bug Investigation:
Automated Form Filling:
🚀 Advanced Automation Features
Drag and Drop Operations:
Natural Scrolling:
Text Input with Timing:
Drag and Drop Operations:
Keyboard Shortcuts:
Menu Navigation:
🎯 UI Element Detection Examples
Smart Button Detection:
Form Automation with Visual Detection:
Modern App UI Navigation:
macOS Dialog Interaction:
Complex Layout Analysis:
Responsive UI Testing:
Visual-Only Element Detection:
🚀 Performance Improvements
Mac Commander has been optimized for high-performance automation with significant improvements in memory usage, processing speed, and reliability:
Memory Optimization
- 99% → 60-70% Memory Usage: Intelligent memory management and buffer optimization
- Automatic Cleanup: Built-in garbage collection and memory throttling
- Smart Buffering: Efficient image processing with minimal memory footprint
Processing Speed
- 60-80% Faster Text Operations: Optimized OCR processing and text search algorithms
- Chunked Image Processing: Large images are processed in efficient chunks
- Parallel Processing: Multiple operations can run concurrently without blocking
Caching System
- 30-70% Cache Hit Rates: Intelligent caching of frequently accessed screenshots
- Smart Cache Management: Automatic cache invalidation and memory-conscious storage
- Performance Monitoring: Real-time tracking of cache effectiveness
Request Batching
- Optimized Concurrency: Multiple simultaneous requests are handled efficiently
- Resource Throttling: Prevents system overload during intensive operations
- Performance Metrics: Built-in monitoring of operation timings and resource usage
These improvements make Mac Commander suitable for intensive automation tasks and long-running operations without performance degradation.
Development
- Run in development mode:
npm run dev
- Test with MCP Inspector:
npm run inspector
⚠️ Limitations & Troubleshooting
Known Limitations
- OCR Accuracy: Text recognition depends on font size, contrast, and clarity
- Permission Requirements: Must manually grant Screen Recording and Accessibility permissions
- First OCR Run: Initial text extraction may be slower due to model loading
- macOS Only: This server only works on macOS systems
🐛 Common Issues
"Permission denied" or "Screen recording not allowed"
- ✅ Grant Screen Recording permission to your AI client
- ✅ Grant Accessibility permission to your AI client
- 🔄 Restart your AI client after granting permissions
"Command not found" or "Cannot find module"
- ✅ Make sure you ran
npm install
andnpm run build
- ✅ Use the absolute path to
build/index.js
in your config - ✅ Verify Node.js is installed:
node --version
"MCP server not showing up"
- ✅ Check your configuration JSON syntax is valid
- ✅ Restart your AI client completely
- ✅ Try the test script:
node test-server.js
"Screenshots are black or empty"
- ✅ Grant Screen Recording permission
- ✅ Make sure the app you're screenshotting is visible (not minimized)
🆘 Getting Help
If you're still having issues:
- Run the test script:
node test-server.js
to verify basic functionality - Check the console: Look for error messages in your AI client
- Open an issue: Create a GitHub issue with:
- Your macOS version
- Your AI client (Claude Desktop, Cursor, etc.)
- The exact error message
- Your configuration file (with paths anonymized)
🔒 Security & Privacy
Important Security Notes
⚠️ This server has powerful capabilities and requires significant system permissions.
What this server can access:
- ✅ Screen content: Can take screenshots of anything visible
- ✅ Keyboard input: Can type any text or key combinations
- ✅ Mouse control: Can click anywhere on screen
- ✅ Window information: Can see and control application windows
- ✅ Text recognition: Can read any text visible on screen
Security best practices:
- 🏠 Only use in trusted environments: Don't use on shared or public computers
- 🤝 Review AI requests: Be mindful of what you ask the AI to do
- 🔐 Sensitive data: Avoid using when sensitive information is visible
- 🚫 Revoke access: You can remove permissions anytime in System Settings
Privacy Notes
- No data is sent externally by this MCP server itself
- Your AI client (Claude Desktop, etc.) may process screenshots/data according to their privacy policies
- Screenshots are temporary and not permanently stored unless you specify a save path
- OCR processing happens locally on your machine
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
📄 License
MIT License - see LICENSE file for details.
📚 Additional Resources
📖 Complete Documentation Suite
- DOCS.md - Complete documentation index and navigation guide
- API.md - Comprehensive API reference and technical documentation
- PERFORMANCE.md - Performance optimization and tuning guide
- MIGRATION.md - Migration guide and version changes
🔗 Related Links
- MCP Protocol - Learn about the Model Context Protocol
- Claude Desktop - Download Claude Desktop
- Claude Code - Web-based Claude Code interface
- Cursor - AI-powered code editor
🛠️ Development Resources
- GitHub Repository - Source code and issue tracking
- Contributing Guidelines - How to contribute
- Release Notes - Version history and changes
💬 Community and Support
- GitHub Issues - Bug reports and feature requests
- GitHub Discussions - Community discussions
- MCP Community - Broader MCP ecosystem
🙏 Acknowledgments
- Built with Model Context Protocol (MCP)
- Uses @nut-tree-fork/nut-js for system automation
- OCR powered by Tesseract.js
- Image processing with node-canvas
Made with ❤️ for the MCP community
Having issues? Open a GitHub issue • Want to contribute? Check our contributing guide
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
An MCP server that allows AI tools like Claude Desktop, Claude Code, and Cursor to visually interact with macOS applications by capturing screenshots and controlling the mouse and keyboard.
- 🚀 Quick Start
- 📚 Table of Contents
- ✨ Features
- 🛠️ Prerequisites
- 📦 Installation
- ⚙️ Configuration
- 📖 Tool Parameter Reference
- screenshot
- click
- type_text
- mouse_move
- key_press
- check_for_errors
- wait
- wait_for_element
- get_screen_info
- list_windows
- get_active_window
- find_window
- focus_window
- get_window_info
- extract_text
- find_text
- find_ui_elements
- drag
- scroll
- hover
- right_click
- list_screenshots
- list_recent_screenshots
- view_screenshot
- cleanup_screenshots
- compare_screenshots
- describe_screenshot
- performance_dashboard
- 🔧 OCR Configuration Options
- 📊 Implementation Status & Available Tools
- 🚀 Usage Examples
- 🚀 Performance Improvements
- Development
- ⚠️ Limitations & Troubleshooting
- 🔒 Security & Privacy
- 🤝 Contributing
- 📄 License
- 📚 Additional Resources
- 🙏 Acknowledgments
Related MCP Servers
- AsecurityAlicenseAqualityAn MCP server that implements Claude Code-like functionality, allowing the AI to analyze codebases, modify files, execute commands, and manage projects through direct file system interactions.Last updated -15265MIT License
- -securityFlicense-qualityAn MCP server that allows AI assistants like Claude to execute terminal commands on the user's computer and return the output, functioning like a terminal through AI.Last updated -57
- -securityAlicense-qualityAn MCP server that enables AI assistants like Claude to access and manipulate Apple Notes on macOS, allowing for retrieving, creating, and managing notes through natural language interactions.Last updated -75MIT License
- AsecurityAlicenseAqualityAn MCP server that allows AI assistants like Claude Code, Claude Desktop, and Cursor to interact with Things.app on macOS, enabling task creation, updates, viewing, scheduling, and organization through natural language.Last updated -6232MIT License