macOS Simulator MCP Server

Mac Commander MCP Server

███╗   ███╗ █████╗  ██████╗                                                      
████╗ ████║██╔══██╗██╔════╝                                                      
██╔████╔██║███████║██║                                                           
██║╚██╔╝██║██╔══██║██║                                                           
██║ ╚═╝ ██║██║  ██║╚██████╗                                                      
╚═╝     ╚═╝╚═╝  ╚═╝ ╚═════╝                                                      
 ██████╗ ██████╗ ███╗   ███╗███╗   ███╗ █████╗ ███╗   ██╗██████╗ ███████╗██████╗ 
██╔════╝██╔═══██╗████╗ ████║████╗ ████║██╔══██╗████╗  ██║██╔══██╗██╔════╝██╔══██╗
██║     ██║   ██║██╔████╔██║██╔████╔██║███████║██╔██╗ ██║██║  ██║█████╗  ██████╔╝
██║     ██║   ██║██║╚██╔╝██║██║╚██╔╝██║██╔══██║██║╚██╗██║██║  ██║██╔══╝  ██╔══██╗
╚██████╗╚██████╔╝██║ ╚═╝ ██║██║ ╚═╝ ██║██║  ██║██║ ╚████║██████╔╝███████╗██║  ██║
 ╚═════╝ ╚═════╝ ╚═╝     ╚═╝╚═╝     ╚═╝╚═╝  ╚═╝╚═╝  ╚═══╝╚═════╝ ╚══════╝╚═╝  ╚═╝

🤖 Enable AI assistants to visually interact with your macOS applications

An MCP (Model Context Protocol) server that allows AI coding tools like Claude Desktop, Claude Code, and Cursor to see, control, and test macOS applications. Perfect for automated testing, UI debugging, and error detection.

🎆 What makes this special?

✨ Visual AI: Your AI can actually see what's on your screen
🎯 Smart UI Detection: Advanced element detection finds buttons, forms, and controls without relying on text
🗾 Error Detection: Automatically finds bugs and error dialogs
🔄 Full Control: Click, type, and navigate just like a human
📱 App Testing: Perfect for testing mobile apps, desktop software, or web interfaces
⚡ High Performance: Optimized memory usage and 60-80% faster text operations
🚀 Easy Setup: Get started in under 5 minutes

🚀 Quick Start

Option 1: Automated Install (Easiest)

# Clone and run the installer
git clone https://github.com/ohqay/mac-commander.git
cd mac-commander
./install.sh

The installer will:

✅ Check your Node.js version
✅ Install dependencies and build the project
✅ Show you the exact configuration to copy
✅ Offer to open System Settings for permissions

Option 2: Manual Install

# 1. Clone and install
git clone https://github.com/ohqay/mac-commander.git
cd mac-commander
npm install && npm run build

# 2. Get the full path for configuration
echo "$(pwd)/build/index.js"

# 3. Add to Claude Desktop config (see Configuration section below)
# 4. Grant Screen Recording & Accessibility permissions
# 5. Restart Claude Desktop and try: "Take a screenshot"

✨ In 2 minutes, your AI will be able to see and control your Mac!

📚 Table of Contents

✨ Features

Core Features

📸 Screenshot Capture: High-performance screenshots with optional base64 compression and metadata-only responses
🖱️ Mouse Control: Click, double-click, and move the mouse cursor
⌨️ Keyboard Input: Type text and press key combinations
🪟 Window Management: List, find, focus, and get information about application windows
🔍 OCR Text Recognition: Extract and find text on screen using Tesseract.js
⚠️ Error Detection: Automatically detect error dialogs and messages using OCR
📏 Screen Information: Get display dimensions and coordinates

Advanced UI Element Detection (New!)

🎯 Multi-Strategy Detection Engine: Combines visual analysis, OCR, color patterns, and shape detection
🔍 20+ UI Element Types: Detects buttons, text fields, links, dialogs, menus, checkboxes, dropdowns, tabs, toolbars, scrollbars, and more
🍎 macOS-Specific Patterns: Optimized for Apple Human Interface Guidelines and native UI components
📊 Confidence Scoring: Each detected element includes reliability scores and validation methods
🔄 Element Classification: Advanced categorization with state detection (enabled/disabled/selected/focused)
🎨 Visual Feature Analysis: Color, shape, border radius, and spatial relationship analysis for accurate detection
🧠 Context-Aware Detection: Groups related elements and understands form patterns, dialog layouts, and menu structures
✅ Interactive Element Validation: Verifies clickability and interactivity through visual characteristics

Advanced Automation Features (New!)

🎯 Drag & Drop: Drag operations for moving UI elements and files
📜 Advanced Scrolling: Directional scrolling with customizable amounts
🖱️ Mouse Gestures: Hover, right-click, and mouse movement controls
⌨️ Keyboard Input: Text typing with configurable delays between keystrokes
🔄 Complex Interactions: Chain multiple actions for sophisticated automation
⏱️ Precise Timing: Built-in wait functionality and timing controls

Performance Features (New!)

⚡ Optimized Memory Usage: Reduced memory consumption from 99% to ~60-70% through intelligent buffering
🚀 Fast Text Search: 60-80% faster find_text operations with optimized OCR processing
💾 Smart Caching System: Intelligent cache with 30-70% hit rates for frequently accessed screenshots
🖼️ Chunked Image Processing: Efficient handling of large images through intelligent chunking
🎛️ Automatic Memory Management: Built-in throttling and cleanup to prevent memory exhaustion
📊 Performance Monitoring: Real-time tracking of memory usage, cache performance, and operation timings
🔄 Request Batching: Optimized handling of multiple simultaneous operations

🛠️ Prerequisites

System Requirements

macOS 13+ (Ventura or later)
Node.js 18+ and npm
AI client with MCP support:
- Claude Desktop (recommended)
- Claude Code
- Cursor with MCP support
- Any other MCP-compatible client

Required macOS Permissions

⚠️ Important: You must grant these permissions or the server won't work!

Screen Recording Permission:
- Go to System Settings → Privacy & Security → Screen Recording
- Click the + button and add your AI client (Claude Desktop, Cursor, etc.)
- ✅ Check the box next to your AI client
Accessibility Permission:
- Go to System Settings → Privacy & Security → Accessibility
- Click the + button and add your AI client
- ✅ Check the box next to your AI client

💡 Tip: You might need to restart your AI client after granting permissions.

📦 Installation

💿 Automated Installation

Recommended for beginners:

# Clone and run the installer
git clone https://github.com/ohqay/mac-commander.git
cd mac-commander
./install.sh

The installer script will guide you through everything!

🔧 Manual Installation

For advanced users:

# Clone the repository
git clone https://github.com/ohqay/mac-commander.git
cd mac-commander

# Install dependencies and build
npm install
npm run build

# Test that it works
npm run inspector

Option 2: Global Install

# Install globally via npm (coming soon)
npm install -g mac-commander

🔧 Verify Installation

Run the test script to make sure everything works:

node test-server.js

You should see the server start and respond to test commands.

⚙️ Configuration

🖥️ Claude Desktop Setup

Open Claude Desktop and go to Settings (gear icon)
Click on the Developer tab
Click Edit Config to open the configuration file
Add the MCP server configuration:

{
  "mcpServers": {
    "mac-commander": {
      "command": "node",
      "args": ["/FULL/PATH/TO/mac-commander/build/index.js"]
    }
  }
}

🚨 Important: Replace /FULL/PATH/TO/ with the actual absolute path to where you cloned this repository!

Example with real path:

{
  "mcpServers": {
    "mac-commander": {
      "command": "node",
      "args": ["/Users/yourname/Developer/mac-commander/build/index.js"]
    }
  }
}

Save the file and restart Claude Desktop
Start a new chat - you should see a 🔨 hammer icon indicating MCP is active

💻 Claude Code Setup

Navigate to your project folder in terminal
Create or edit .claude/config.json in your project root:

mkdir -p .claude
echo '{
  "mcpServers": {
    "mac-commander": {
      "command": "node",
      "args": ["/FULL/PATH/TO/mac-commander/build/index.js"]
    }
  }
}' > .claude/config.json

Start Claude Code in that project folder:

claude

🎯 Cursor Setup

Open Cursor and go to Settings → Cursor Settings → MCP
Click "Add new global MCP server"
Add the configuration:
- Name: mac-commander
- Command: node
- Args: /FULL/PATH/TO/mac-commander/build/index.js

Or create ~/.cursor/mcp.json:

{
  "mcpServers": {
    "mac-commander": {
      "command": "node",
      "args": ["/FULL/PATH/TO/mac-commander/build/index.js"]
    }
  }
}

🔍 Finding Your Full Path

Not sure what your full path is? Run this in the project directory:

echo "$(pwd)/build/index.js"

Example output: /Users/yourname/Developer/mac-commander/build/index.js

Copy this exact path and use it in your configuration files above.

✅ Verify It's Working

After configuration:

Restart your AI client (Claude Desktop, Cursor, etc.)
Start a new chat/session
Look for the MCP indicator (hammer icon in Claude Desktop)
Try a test command: "Take a screenshot of my screen"

If it works, you'll see the AI successfully take a screenshot! 🎉

📖 Tool Parameter Reference

screenshot

Capture a screenshot of the screen or a specific region with optimized performance.

Parameters:

outputPath (optional): Path to save the screenshot as PNG
region (optional): Object with x, y, width, height to capture specific area
returnBase64 (optional): Return base64 data in response (default: false)
compressionQuality (optional): JPEG compression quality 10-100 for base64 responses (default: 80)

Performance Features:

Default mode returns metadata only (fast, small responses)
Base64 mode includes compressed image data when returnBase64: true
60-80% size reduction through JPEG compression
Always saves to temp folder for later access regardless of mode

Usage Examples:

Metadata-only mode (recommended for performance):

{
  "outputPath": "/tmp/screenshot.png"
}

With base64 data for immediate processing:

{
  "returnBase64": true,
  "compressionQuality": 60
}

Region capture with high compression:

{
  "region": {"x": 100, "y": 100, "width": 400, "height": 300},
  "returnBase64": true,
  "compressionQuality": 90
}

click

Click at specific coordinates on the screen.

Parameters:

x: X coordinate
y: Y coordinate
button: "left", "right", or "middle" (default: "left")
doubleClick: boolean (default: false)
verify: boolean (default: false) - Take a screenshot after clicking to verify the action

type_text

Type text using the keyboard.

Parameters:

text: Text to type
delay: Delay between keystrokes in milliseconds (default: 50)

mouse_move

Move the mouse to specific coordinates.

Parameters:

x: X coordinate
y: Y coordinate

key_press

Press a key or key combination.

Parameters:

key: Key to press (e.g., "Enter", "Escape", "cmd+a")

check_for_errors

Check the screen for common error indicators.

Parameters:

region (optional): Specific region to check

wait

Wait for a specified amount of time.

Parameters:

milliseconds: Time to wait

wait_for_element

Wait for specific text or UI element to appear on screen before continuing. Essential for handling dynamic content, loading screens, and asynchronous UI updates.

Parameters:

text: Text to wait for on screen
timeout: Maximum wait time in milliseconds (default: 10000)
pollInterval: How often to check in milliseconds (default: 500)
region (optional): Specific region to search in with x, y, width, height coordinates

Returns success/failure status and location of found element if successful. Perfect for waiting for buttons to become available, dialogs to appear, or loading indicators to disappear.

get_screen_info

Get information about the screen dimensions.

No parameters required.

list_windows

List all open windows with their titles and positions.

No parameters required.

get_active_window

Get information about the currently active window.

No parameters required.

find_window

Find a window by its title (partial match supported).

Parameters:

title: Window title to search for

focus_window

Focus/activate a window by its title.

Parameters:

title: Window title to focus

get_window_info

Get detailed information about a specific window.

Parameters:

title: Window title to get info for

extract_text

Extract and read text from the screen or specific regions using advanced Optical Character Recognition (OCR). Features improved caching system for better performance, confidence scoring, and enhanced text recognition accuracy. Supports fuzzy text matching and configurable OCR settings for optimal results.

Parameters:

region (optional): Specific region to extract text from with x, y, width, height coordinates

Enhanced Features:

Smart Caching: Multi-level caching system with image hash-based keys for better performance
Confidence Filtering: Configurable minimum confidence thresholds (default: 50%)
Optimized Processing: Uses worker pool for concurrent OCR operations
Error Handling: Comprehensive error detection and recovery
Performance Tracking: Built-in timing and performance metrics

find_text

Locate specific text on the screen using advanced OCR with fuzzy matching capabilities. Returns precise coordinates, confidence scores, and handles OCR variations automatically. Essential for robust UI automation that adapts to text rendering differences.

Parameters:

text: Text to search for (supports fuzzy matching for OCR variations)
region (optional): Specific region to search in with x, y, width, height coordinates

Enhanced Features:

Fuzzy Text Matching: Handles OCR variations with configurable similarity thresholds
- Standard threshold: 70% similarity (configurable)
- Relaxed threshold: 50% similarity for difficult text
- Levenshtein distance algorithm for accurate matching
Smart Sorting: Results sorted by similarity score and confidence level
Multiple Match Support: Returns all matching text locations with coordinates
Center Point Calculation: Provides precise click coordinates for each match
Confidence Scoring: Each match includes OCR confidence level
Performance Optimized: Cached results and memory management

Example Fuzzy Matching:

Search for "Submit" → Finds "Subm1t", "SUBMIT", "submit" (OCR variations)
Search for "Login" → Matches "Log1n", "LOGIN", "Iog in" (common OCR errors)
Search for "Cancel" → Finds "Cancei", "CANCEL", "cancel" (character misrecognition)

find_ui_elements

Advanced UI element detection system that intelligently identifies interactive components using multiple detection strategies. Unlike text-only detection, this tool can accurately find buttons, text fields, dropdowns, and other UI elements even when they don't contain visible text. Perfect for modern applications with visual-only buttons, icons, and complex layouts.

Parameters:

autoSave (optional): Whether to save the screenshot for analysis (default: true)
elementTypes (optional): Array of specific element types to detect: ['button', 'text_field', 'link', 'image', 'icon', 'dialog', 'menu', 'window', 'checkbox', 'radio_button', 'dropdown', 'slider', 'tab', 'toolbar', 'list', 'table', 'scrollbar', 'other']
region (optional): Specific region to analyze with x, y, width, height coordinates

Detection Strategies:

Visual Analysis: Detects UI elements based on shape, color, and visual patterns
OCR Text Recognition: Identifies elements with text content and labels
Color Pattern Analysis: Recognizes macOS system colors and UI themes
Shape Detection: Finds rectangular buttons, rounded elements, and geometric patterns
Context Analysis: Groups related elements and understands spatial relationships

macOS-Specific Features:

Apple HIG Compliance: Optimized for Apple Human Interface Guidelines
System Color Recognition: Detects standard macOS button colors (#007AFF, #34C759, #FF3B30, etc.)
Touch Target Validation: Ensures elements meet minimum 44x44 pixel requirements
Native UI Patterns: Recognizes standard macOS dialogs, menus, and controls

Output Format: Returns comprehensive element information including:

Element type and subtype classification
Precise coordinates and clickable center points
Confidence scores and detection methods used
Visual features (colors, border radius, shadows)
Interactive validation results
Element state (enabled/disabled/selected)
Contextual relationships with nearby elements

Example Use Cases:

Find all clickable buttons in a dialog: elementTypes: ['button']
Detect form elements: elementTypes: ['text_field', 'button', 'dropdown']
Locate menu items: elementTypes: ['menu', 'link']
Find all interactive elements: (no elementTypes filter)

drag

Drag from one point to another using mouse button hold.

Parameters:

startX: Starting X coordinate
startY: Starting Y coordinate
endX: Ending X coordinate
endY: Ending Y coordinate
button: Mouse button to use for dragging (default: "left")

scroll

Scroll in any direction within the current window or a specific region.

Parameters:

direction: Direction to scroll ("up", "down", "left", "right")
amount: Number of scroll units (default: 5)
x (optional): X coordinate to scroll at (defaults to current mouse position)
y (optional): Y coordinate to scroll at (defaults to current mouse position)

hover

Hover the mouse at a specific position for a duration.

Parameters:

x: X coordinate to hover at
y: Y coordinate to hover at
duration: Duration to hover in milliseconds (default: 1000)

right_click

Right-click at specific coordinates to open context menus.

Parameters:

x: X coordinate to right-click
y: Y coordinate to right-click

list_screenshots

List all screenshots saved in the temporary folder.

No parameters required.

list_recent_screenshots

List recently captured screenshots with detailed metadata including timestamps, file sizes, and dimensions.

Parameters:

limit: Maximum number of screenshots to list (default: 10, max: 50)

view_screenshot

View/display a specific screenshot from the temporary folder.

Parameters:

filename: Name of the screenshot file to view

cleanup_screenshots

Clean up old screenshots from temporary folder, keeping only recent ones.

Parameters:

keepLast: Number of recent screenshots to keep (default: 10)

compare_screenshots

Compare two previously saved screenshots to identify differences and changes.

Parameters:

screenshot1: Filename of the first screenshot
screenshot2: Filename of the second screenshot

describe_screenshot

Capture and analyze a screenshot with AI-powered insights, combining OCR text extraction and UI element detection.

Parameters:

region (optional): Specific region to analyze with x, y, width, height coordinates
savePath (optional): Optional path to save the analyzed screenshot

performance_dashboard

Comprehensive performance monitoring dashboard providing real-time system health, metrics, and optimization recommendations.

Parameters:

includeMetrics (optional): Include detailed metrics in response (default: true)
includeRecommendations (optional): Include optimization recommendations (default: true)
includeHistory (optional): Include performance history and trends (default: false)
timeRangeMs (optional): Time range for trends in milliseconds (default: 1 hour)

🔧 OCR Configuration Options

The OCR system can be customized with various configuration options to optimize performance and accuracy for different use cases:

configureOCR(options)

Configure OCR settings globally for all text recognition operations.

Available Options:

minConfidence: Minimum confidence score for text recognition (default: 50, range: 0-100)
fuzzyMatchThreshold: Standard similarity threshold for fuzzy matching (default: 0.7, range: 0-1)
relaxedFuzzyThreshold: Fallback threshold for difficult text (default: 0.5, range: 0-1)
cacheEnabled: Enable/disable OCR result caching (default: true)
cacheTTL: Cache time-to-live in milliseconds (default: 30000)
maxCacheSize: Maximum number of cached results (default: 100)
timeoutMs: OCR operation timeout in milliseconds (default: 30000)

Example Configuration:

// For high-accuracy applications
configureOCR({
  minConfidence: 80,
  fuzzyMatchThreshold: 0.9,
  relaxedFuzzyThreshold: 0.7
});

// For fast processing with lower accuracy requirements
configureOCR({
  minConfidence: 30,
  fuzzyMatchThreshold: 0.6,
  relaxedFuzzyThreshold: 0.4,
  cacheTTL: 60000  // Longer cache for faster responses
});

// For memory-constrained environments
configureOCR({
  cacheEnabled: false,
  maxCacheSize: 50
});

OCR Performance Features

Worker Pool Architecture:

Concurrent OCR processing with multiple worker threads
Automatic load balancing and task prioritization
Graceful fallback to single worker if pool initialization fails

Intelligent Caching:

Multi-level caching with image hash and region-based keys
Automatic cache cleanup and size management
Configurable TTL and cache size limits

Memory Management:

Automatic garbage collection triggers for large OCR operations
Memory usage monitoring and cleanup
Efficient image processing and buffer management

Error Handling:

Comprehensive error detection and recovery
Timeout protection for long-running OCR operations
Detailed error reporting with context information

📊 Implementation Status & Available Tools

✅ Fully Implemented Tools

Screenshot Management:

screenshot - Screen capture with region support and compression
list_screenshots - List all saved screenshots
list_recent_screenshots - List recent screenshots with metadata
view_screenshot - View specific screenshot files
cleanup_screenshots - Clean up old screenshot files
compare_screenshots - Compare two screenshots
describe_screenshot - AI-powered screenshot analysis

Mouse & Keyboard Control:

click - Click with multiple button support and verification
type_text - Text input with configurable delays
key_press - Key combinations and shortcuts
mouse_move - Move mouse cursor
drag - Drag and drop operations
scroll - Directional scrolling
hover - Mouse hover with duration
right_click - Context menu access

Window Management:

list_windows - List all open windows
get_active_window - Get current window info
find_window - Find window by title
focus_window - Bring window to front
get_window_info - Detailed window information

OCR & Text Recognition:

extract_text - OCR text extraction with caching
find_text - Locate text on screen with fuzzy matching
wait_for_element - Wait for text/elements to appear
find_ui_elements - Advanced visual UI element detection

System & Utilities:

get_screen_info - Screen dimensions
check_for_errors - Visual error detection
wait - Pause execution
diagnostic - System health check
performance_dashboard - Performance monitoring

🚧 Planned Features (Not Yet Implemented)

The following features mentioned in examples are planned for future releases:

click_hold - Click and hold operations
relative_mouse_move - Relative mouse positioning
key_hold - Hold keys for duration
type_with_delay - Human-like typing with variable delays and typos
Advanced smooth scrolling with easing
Pixel-perfect scrolling controls

🚀 Usage Examples

🎯 Basic Commands

Once configured, you can ask your AI assistant to:

Screenshots & Visual Inspection:

"Take a screenshot of my app" (metadata-only for fast responses)
"Capture just the top-left corner of the screen"
"Save a screenshot to ~/Desktop/app-screenshot.png"
"Take a screenshot and return the base64 data with 70% compression"
"Capture a region and return compressed image data for processing"

Mouse & Keyboard Control:

"Click the button at coordinates 100, 200"
"Double-click on the center of the screen"
"Type 'Hello World' in the current field"
"Press cmd+s to save the file"
"Press Enter to submit"

Window Management:

"List all open windows"
"Focus the Safari window"
"Get information about the active window"
"Find the window with 'Calculator' in the title"

Text Recognition & Search:

"Extract all text from the screen"
"Find the 'Submit' button on screen"
"Look for any text containing 'error' on screen"
"Read the text in the dialog box"

UI Element Detection:

"Find all clickable buttons on this screen"
"Detect text fields and form elements in this dialog"
"Locate all interactive elements (buttons, links, dropdowns)"
"Identify the toolbar and menu elements visually"
"Find UI elements by type: buttons, text fields, and checkboxes"

Error Detection:

"Check if there are any error dialogs on screen"
"Look for error messages in my app"
"Scan for any warning or error indicators"

🔧 Advanced Automation Examples

UI Testing Workflow:

"Please help me test my app:
Take a screenshot first
Click the 'Start' button
Wait 2 seconds
Check if any errors appeared
Take another screenshot to compare"

Bug Investigation:

"I'm having issues with my app:
Focus the MyApp window
Extract all visible text
Look for any error messages
Take a screenshot of the current state"

Automated Form Filling:

"Help me fill out this form:
Click at coordinates 300, 150 (username field)
Type 'testuser'
Press Tab to move to next field
Type 'password123'
Find and click the Submit button"

🚀 Advanced Automation Features

Drag and Drop Operations:

"Drag the file from the desktop to the trash:
Find the file icon at coordinates 100, 200
Drag it smoothly to the trash at 800, 600 over 2 seconds
Verify the file was moved"

Natural Scrolling:

"Scroll through the document naturally:
Smooth scroll down by 500 pixels
Wait 1 second
Scroll to find the text 'Chapter 3'
Hover over the heading for emphasis"

Text Input with Timing:

"Type this email with proper timing:
Click the compose button
Type the email address with 50ms delays
Press Tab to move to subject
Type 'Meeting Tomorrow'
Tab to body and type the message"

Drag and Drop Operations:

"Perform a drag operation:
Move to the start position
Drag from coordinates 100,200 to 300,400
Wait for the operation to complete
Verify the item was moved"

Keyboard Shortcuts:

"Use developer tools effectively:
Press cmd+shift+i to open inspector
Wait 2000ms for tools to load
Type 'console.log' in the console
Press Enter to execute"

Menu Navigation:

"Navigate the UI effectively:
Move mouse to coordinates 400, 200
Hover over the menu for 1 second
Click and wait for dropdown
Move to coordinates 400, 250 and click the option"

🎯 UI Element Detection Examples

Smart Button Detection:

"Find all clickable buttons in this dialog:
Use find_ui_elements to detect all interactive buttons
Identify the 'Cancel' and 'OK' buttons by their visual characteristics
Click the 'OK' button using the returned coordinates
Verify the action was successful"

Form Automation with Visual Detection:

"Help me fill out this form automatically:
Use find_ui_elements with elementTypes: ['text_field', 'button', 'dropdown']
Locate the username text field (even if it has no label)
Click the field and enter 'john.doe@example.com'
Find the password field using visual detection
Enter the password and click the submit button"

Modern App UI Navigation:

"Navigate this modern app with icon-only buttons:
Use find_ui_elements to detect all clickable elements
Find buttons by their visual characteristics (not text)
Identify the settings gear icon using shape and color analysis
Click the settings button and verify the menu opens"

macOS Dialog Interaction:

"Handle this system dialog intelligently:
Use find_ui_elements to detect dialog structure
Identify the dialog type (alert, confirmation, file picker)
Find all available actions (buttons, checkboxes, dropdowns)
Choose the appropriate action based on the dialog context
Verify the dialog was dismissed correctly"

Complex Layout Analysis:

"Analyze this complex application layout:
Use find_ui_elements to map all UI components
Group related elements (toolbars, sidebars, content areas)
Identify interactive vs. static elements
Create a clickable element map for automation
Test clicking each interactive element"

Responsive UI Testing:

"Test this responsive interface across different states:
Take a screenshot and analyze initial UI state
Use find_ui_elements to detect all interactive components
Click elements to change the interface state
Re-analyze the UI to detect new/changed elements
Verify all states are working correctly"

Visual-Only Element Detection:

"Find elements without any text labels:
Use find_ui_elements with visual analysis only
Detect icon buttons, image buttons, and graphic controls
Identify element states (enabled/disabled/selected)
Test interactivity of visual-only elements
Map the visual UI structure for automation"

🚀 Performance Improvements

Mac Commander has been optimized for high-performance automation with significant improvements in memory usage, processing speed, and reliability:

Memory Optimization

99% → 60-70% Memory Usage: Intelligent memory management and buffer optimization
Automatic Cleanup: Built-in garbage collection and memory throttling
Smart Buffering: Efficient image processing with minimal memory footprint

Processing Speed

60-80% Faster Text Operations: Optimized OCR processing and text search algorithms
Chunked Image Processing: Large images are processed in efficient chunks
Parallel Processing: Multiple operations can run concurrently without blocking

Caching System

30-70% Cache Hit Rates: Intelligent caching of frequently accessed screenshots
Smart Cache Management: Automatic cache invalidation and memory-conscious storage
Performance Monitoring: Real-time tracking of cache effectiveness

Request Batching

Optimized Concurrency: Multiple simultaneous requests are handled efficiently
Resource Throttling: Prevents system overload during intensive operations
Performance Metrics: Built-in monitoring of operation timings and resource usage

These improvements make Mac Commander suitable for intensive automation tasks and long-running operations without performance degradation.

Development

Run in development mode: npm run dev
Test with MCP Inspector: npm run inspector

⚠️ Limitations & Troubleshooting

Known Limitations

OCR Accuracy: Text recognition depends on font size, contrast, and clarity
Permission Requirements: Must manually grant Screen Recording and Accessibility permissions
First OCR Run: Initial text extraction may be slower due to model loading
macOS Only: This server only works on macOS systems

🐛 Common Issues

"Permission denied" or "Screen recording not allowed"

✅ Grant Screen Recording permission to your AI client
✅ Grant Accessibility permission to your AI client
🔄 Restart your AI client after granting permissions

"Command not found" or "Cannot find module"

✅ Make sure you ran npm install and npm run build
✅ Use the absolute path to build/index.js in your config
✅ Verify Node.js is installed: node --version

"MCP server not showing up"

✅ Check your configuration JSON syntax is valid
✅ Restart your AI client completely
✅ Try the test script: node test-server.js

"Screenshots are black or empty"

✅ Grant Screen Recording permission
✅ Make sure the app you're screenshotting is visible (not minimized)

🆘 Getting Help

If you're still having issues:

Run the test script: node test-server.js to verify basic functionality
Check the console: Look for error messages in your AI client
Open an issue: Create a GitHub issue with:
- Your macOS version
- Your AI client (Claude Desktop, Cursor, etc.)
- The exact error message
- Your configuration file (with paths anonymized)

🔒 Security & Privacy

Important Security Notes

⚠️ This server has powerful capabilities and requires significant system permissions.

What this server can access:

✅ Screen content: Can take screenshots of anything visible
✅ Keyboard input: Can type any text or key combinations
✅ Mouse control: Can click anywhere on screen
✅ Window information: Can see and control application windows
✅ Text recognition: Can read any text visible on screen

Security best practices:

🏠 Only use in trusted environments: Don't use on shared or public computers
🤝 Review AI requests: Be mindful of what you ask the AI to do
🔐 Sensitive data: Avoid using when sensitive information is visible
🚫 Revoke access: You can remove permissions anytime in System Settings

Privacy Notes

No data is sent externally by this MCP server itself
Your AI client (Claude Desktop, etc.) may process screenshots/data according to their privacy policies
Screenshots are temporary and not permanently stored unless you specify a save path
OCR processing happens locally on your machine

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

📚 Additional Resources

📖 Complete Documentation Suite

DOCS.md - Complete documentation index and navigation guide
API.md - Comprehensive API reference and technical documentation
PERFORMANCE.md - Performance optimization and tuning guide
MIGRATION.md - Migration guide and version changes

MCP Protocol - Learn about the Model Context Protocol
Claude Desktop - Download Claude Desktop
Claude Code - Web-based Claude Code interface
Cursor - AI-powered code editor

🛠️ Development Resources

GitHub Repository - Source code and issue tracking
Contributing Guidelines - How to contribute
Release Notes - Version history and changes

💬 Community and Support

GitHub Issues - Bug reports and feature requests
GitHub Discussions - Community discussions
MCP Community - Broader MCP ecosystem

🙏 Acknowledgments

Built with Model Context Protocol (MCP)
Uses @nut-tree-fork/nut-js for system automation
OCR powered by Tesseract.js
Image processing with node-canvas

Made with ❤️ for the MCP community

Having issues? Open a GitHub issue • Want to contribute? Check our contributing guide

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

local-only server

The server can only run on the client's local machine because it depends on local resources.

An MCP server that allows AI tools like Claude Desktop, Claude Code, and Cursor to visually interact with macOS applications by capturing screenshots and controlling the mouse and keyboard.

Related MCP Servers

MCP Claude Code
SDGLBL
A
security
A
license
A
quality
An MCP server that implements Claude Code-like functionality, allowing the AI to analyze codebases, modify files, execute commands, and manage projects through direct file system interactions.
Last updated -
15
265
MIT License
Terminal MCP Server
theailanguage
-
security
F
license
-
quality
An MCP server that allows AI assistants like Claude to execute terminal commands on the user's computer and return the output, functioning like a terminal through AI.
Last updated -
57
Notes MCP
krasun
-
security
A
license
-
quality
An MCP server that enables AI assistants like Claude to access and manipulate Apple Notes on macOS, allowing for retrieving, creating, and managing notes through natural language interactions.
Last updated -
75
MIT License
Things MCP Server
wbopan
A
security
A
license
A
quality
An MCP server that allows AI assistants like Claude Code, Claude Desktop, and Cursor to interact with Things.app on macOS, enabling task creation, updates, viewing, scheduling, and organization through natural language.
Last updated -
6
23
2
MIT License

View all related MCP servers