Skip to main content
Glama

macOS Simulator MCP Server

by ohqay

Mac Commander MCP Server

███╗ ███╗ █████╗ ██████╗ ████╗ ████║██╔══██╗██╔════╝ ██╔████╔██║███████║██║ ██║╚██╔╝██║██╔══██║██║ ██║ ╚═╝ ██║██║ ██║╚██████╗ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝ ██████╗ ██████╗ ███╗ ███╗███╗ ███╗ █████╗ ███╗ ██╗██████╗ ███████╗██████╗ ██╔════╝██╔═══██╗████╗ ████║████╗ ████║██╔══██╗████╗ ██║██╔══██╗██╔════╝██╔══██╗ ██║ ██║ ██║██╔████╔██║██╔████╔██║███████║██╔██╗ ██║██║ ██║█████╗ ██████╔╝ ██║ ██║ ██║██║╚██╔╝██║██║╚██╔╝██║██╔══██║██║╚██╗██║██║ ██║██╔══╝ ██╔══██╗ ╚██████╗╚██████╔╝██║ ╚═╝ ██║██║ ╚═╝ ██║██║ ██║██║ ╚████║██████╔╝███████╗██║ ██║ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═══╝╚═════╝ ╚══════╝╚═╝ ╚═╝

🤖 Enable AI assistants to visually interact with your macOS applications

An MCP (Model Context Protocol) server that allows AI coding tools like Claude Desktop, Claude Code, and Cursor to see, control, and test macOS applications. Perfect for automated testing, UI debugging, and error detection.

🎆 What makes this special?

  • Visual AI: Your AI can actually see what's on your screen
  • 🎯 Smart UI Detection: Advanced element detection finds buttons, forms, and controls without relying on text
  • 🗾 Error Detection: Automatically finds bugs and error dialogs
  • 🔄 Full Control: Click, type, and navigate just like a human
  • 📱 App Testing: Perfect for testing mobile apps, desktop software, or web interfaces
  • High Performance: Optimized memory usage and 60-80% faster text operations
  • 🚀 Easy Setup: Get started in under 5 minutes

🚀 Quick Start

Option 1: Automated Install (Easiest)

# Clone and run the installer git clone https://github.com/ohqay/mac-commander.git cd mac-commander ./install.sh

The installer will:

  • ✅ Check your Node.js version
  • ✅ Install dependencies and build the project
  • ✅ Show you the exact configuration to copy
  • ✅ Offer to open System Settings for permissions

Option 2: Manual Install

# 1. Clone and install git clone https://github.com/ohqay/mac-commander.git cd mac-commander npm install && npm run build # 2. Get the full path for configuration echo "$(pwd)/build/index.js" # 3. Add to Claude Desktop config (see Configuration section below) # 4. Grant Screen Recording & Accessibility permissions # 5. Restart Claude Desktop and try: "Take a screenshot"

✨ In 2 minutes, your AI will be able to see and control your Mac!

📚 Table of Contents

✨ Features

Core Features

  • 📸 Screenshot Capture: High-performance screenshots with optional base64 compression and metadata-only responses
  • 🖱️ Mouse Control: Click, double-click, and move the mouse cursor
  • ⌨️ Keyboard Input: Type text and press key combinations
  • 🪟 Window Management: List, find, focus, and get information about application windows
  • 🔍 OCR Text Recognition: Extract and find text on screen using Tesseract.js
  • ⚠️ Error Detection: Automatically detect error dialogs and messages using OCR
  • 📏 Screen Information: Get display dimensions and coordinates

Advanced UI Element Detection (New!)

  • 🎯 Multi-Strategy Detection Engine: Combines visual analysis, OCR, color patterns, and shape detection
  • 🔍 20+ UI Element Types: Detects buttons, text fields, links, dialogs, menus, checkboxes, dropdowns, tabs, toolbars, scrollbars, and more
  • 🍎 macOS-Specific Patterns: Optimized for Apple Human Interface Guidelines and native UI components
  • 📊 Confidence Scoring: Each detected element includes reliability scores and validation methods
  • 🔄 Element Classification: Advanced categorization with state detection (enabled/disabled/selected/focused)
  • 🎨 Visual Feature Analysis: Color, shape, border radius, and spatial relationship analysis for accurate detection
  • 🧠 Context-Aware Detection: Groups related elements and understands form patterns, dialog layouts, and menu structures
  • Interactive Element Validation: Verifies clickability and interactivity through visual characteristics

Advanced Automation Features (New!)

  • 🎯 Drag & Drop: Drag operations for moving UI elements and files
  • 📜 Advanced Scrolling: Directional scrolling with customizable amounts
  • 🖱️ Mouse Gestures: Hover, right-click, and mouse movement controls
  • ⌨️ Keyboard Input: Text typing with configurable delays between keystrokes
  • 🔄 Complex Interactions: Chain multiple actions for sophisticated automation
  • ⏱️ Precise Timing: Built-in wait functionality and timing controls

Performance Features (New!)

  • Optimized Memory Usage: Reduced memory consumption from 99% to ~60-70% through intelligent buffering
  • 🚀 Fast Text Search: 60-80% faster find_text operations with optimized OCR processing
  • 💾 Smart Caching System: Intelligent cache with 30-70% hit rates for frequently accessed screenshots
  • 🖼️ Chunked Image Processing: Efficient handling of large images through intelligent chunking
  • 🎛️ Automatic Memory Management: Built-in throttling and cleanup to prevent memory exhaustion
  • 📊 Performance Monitoring: Real-time tracking of memory usage, cache performance, and operation timings
  • 🔄 Request Batching: Optimized handling of multiple simultaneous operations

🛠️ Prerequisites

System Requirements

  • macOS 13+ (Ventura or later)
  • Node.js 18+ and npm
  • AI client with MCP support:

Required macOS Permissions

⚠️ Important: You must grant these permissions or the server won't work!

  1. Screen Recording Permission:
    • Go to System SettingsPrivacy & SecurityScreen Recording
    • Click the + button and add your AI client (Claude Desktop, Cursor, etc.)
    • ✅ Check the box next to your AI client
  2. Accessibility Permission:
    • Go to System SettingsPrivacy & SecurityAccessibility
    • Click the + button and add your AI client
    • ✅ Check the box next to your AI client

💡 Tip: You might need to restart your AI client after granting permissions.

📦 Installation

💿 Automated Installation

Recommended for beginners:

# Clone and run the installer git clone https://github.com/ohqay/mac-commander.git cd mac-commander ./install.sh

The installer script will guide you through everything!

🔧 Manual Installation

For advanced users:

# Clone the repository git clone https://github.com/ohqay/mac-commander.git cd mac-commander # Install dependencies and build npm install npm run build # Test that it works npm run inspector

Option 2: Global Install

# Install globally via npm (coming soon) npm install -g mac-commander

🔧 Verify Installation

Run the test script to make sure everything works:

node test-server.js

You should see the server start and respond to test commands.

⚙️ Configuration

🖥️ Claude Desktop Setup

  1. Open Claude Desktop and go to Settings (gear icon)
  2. Click on the Developer tab
  3. Click Edit Config to open the configuration file
  4. Add the MCP server configuration:
{ "mcpServers": { "mac-commander": { "command": "node", "args": ["/FULL/PATH/TO/mac-commander/build/index.js"] } } }

🚨 Important: Replace /FULL/PATH/TO/ with the actual absolute path to where you cloned this repository!

Example with real path:

{ "mcpServers": { "mac-commander": { "command": "node", "args": ["/Users/yourname/Developer/mac-commander/build/index.js"] } } }
  1. Save the file and restart Claude Desktop
  2. Start a new chat - you should see a 🔨 hammer icon indicating MCP is active

💻 Claude Code Setup

  1. Navigate to your project folder in terminal
  2. Create or edit .claude/config.json in your project root:
mkdir -p .claude echo '{ "mcpServers": { "mac-commander": { "command": "node", "args": ["/FULL/PATH/TO/mac-commander/build/index.js"] } } }' > .claude/config.json
  1. Start Claude Code in that project folder:
claude

🎯 Cursor Setup

  1. Open Cursor and go to SettingsCursor SettingsMCP
  2. Click "Add new global MCP server"
  3. Add the configuration:
    • Name: mac-commander
    • Command: node
    • Args: /FULL/PATH/TO/mac-commander/build/index.js

Or create ~/.cursor/mcp.json:

{ "mcpServers": { "mac-commander": { "command": "node", "args": ["/FULL/PATH/TO/mac-commander/build/index.js"] } } }

🔍 Finding Your Full Path

Not sure what your full path is? Run this in the project directory:

echo "$(pwd)/build/index.js"

Example output: /Users/yourname/Developer/mac-commander/build/index.js

Copy this exact path and use it in your configuration files above.

✅ Verify It's Working

After configuration:

  1. Restart your AI client (Claude Desktop, Cursor, etc.)
  2. Start a new chat/session
  3. Look for the MCP indicator (hammer icon in Claude Desktop)
  4. Try a test command: "Take a screenshot of my screen"

If it works, you'll see the AI successfully take a screenshot! 🎉

📖 Tool Parameter Reference

screenshot

Capture a screenshot of the screen or a specific region with optimized performance.

Parameters:

  • outputPath (optional): Path to save the screenshot as PNG
  • region (optional): Object with x, y, width, height to capture specific area
  • returnBase64 (optional): Return base64 data in response (default: false)
  • compressionQuality (optional): JPEG compression quality 10-100 for base64 responses (default: 80)

Performance Features:

  • Default mode returns metadata only (fast, small responses)
  • Base64 mode includes compressed image data when returnBase64: true
  • 60-80% size reduction through JPEG compression
  • Always saves to temp folder for later access regardless of mode

Usage Examples:

Metadata-only mode (recommended for performance):

{ "outputPath": "/tmp/screenshot.png" }

With base64 data for immediate processing:

{ "returnBase64": true, "compressionQuality": 60 }

Region capture with high compression:

{ "region": {"x": 100, "y": 100, "width": 400, "height": 300}, "returnBase64": true, "compressionQuality": 90 }

click

Click at specific coordinates on the screen.

Parameters:

  • x: X coordinate
  • y: Y coordinate
  • button: "left", "right", or "middle" (default: "left")
  • doubleClick: boolean (default: false)
  • verify: boolean (default: false) - Take a screenshot after clicking to verify the action

type_text

Type text using the keyboard.

Parameters:

  • text: Text to type
  • delay: Delay between keystrokes in milliseconds (default: 50)

mouse_move

Move the mouse to specific coordinates.

Parameters:

  • x: X coordinate
  • y: Y coordinate

key_press

Press a key or key combination.

Parameters:

  • key: Key to press (e.g., "Enter", "Escape", "cmd+a")

check_for_errors

Check the screen for common error indicators.

Parameters:

  • region (optional): Specific region to check

wait

Wait for a specified amount of time.

Parameters:

  • milliseconds: Time to wait

wait_for_element

Wait for specific text or UI element to appear on screen before continuing. Essential for handling dynamic content, loading screens, and asynchronous UI updates.

Parameters:

  • text: Text to wait for on screen
  • timeout: Maximum wait time in milliseconds (default: 10000)
  • pollInterval: How often to check in milliseconds (default: 500)
  • region (optional): Specific region to search in with x, y, width, height coordinates

Returns success/failure status and location of found element if successful. Perfect for waiting for buttons to become available, dialogs to appear, or loading indicators to disappear.

get_screen_info

Get information about the screen dimensions.

No parameters required.

list_windows

List all open windows with their titles and positions.

No parameters required.

get_active_window

Get information about the currently active window.

No parameters required.

find_window

Find a window by its title (partial match supported).

Parameters:

  • title: Window title to search for

focus_window

Focus/activate a window by its title.

Parameters:

  • title: Window title to focus

get_window_info

Get detailed information about a specific window.

Parameters:

  • title: Window title to get info for

extract_text

Extract and read text from the screen or specific regions using advanced Optical Character Recognition (OCR). Features improved caching system for better performance, confidence scoring, and enhanced text recognition accuracy. Supports fuzzy text matching and configurable OCR settings for optimal results.

Parameters:

  • region (optional): Specific region to extract text from with x, y, width, height coordinates

Enhanced Features:

  • Smart Caching: Multi-level caching system with image hash-based keys for better performance
  • Confidence Filtering: Configurable minimum confidence thresholds (default: 50%)
  • Optimized Processing: Uses worker pool for concurrent OCR operations
  • Error Handling: Comprehensive error detection and recovery
  • Performance Tracking: Built-in timing and performance metrics

find_text

Locate specific text on the screen using advanced OCR with fuzzy matching capabilities. Returns precise coordinates, confidence scores, and handles OCR variations automatically. Essential for robust UI automation that adapts to text rendering differences.

Parameters:

  • text: Text to search for (supports fuzzy matching for OCR variations)
  • region (optional): Specific region to search in with x, y, width, height coordinates

Enhanced Features:

  • Fuzzy Text Matching: Handles OCR variations with configurable similarity thresholds
    • Standard threshold: 70% similarity (configurable)
    • Relaxed threshold: 50% similarity for difficult text
    • Levenshtein distance algorithm for accurate matching
  • Smart Sorting: Results sorted by similarity score and confidence level
  • Multiple Match Support: Returns all matching text locations with coordinates
  • Center Point Calculation: Provides precise click coordinates for each match
  • Confidence Scoring: Each match includes OCR confidence level
  • Performance Optimized: Cached results and memory management

Example Fuzzy Matching:

  • Search for "Submit" → Finds "Subm1t", "SUBMIT", "submit" (OCR variations)
  • Search for "Login" → Matches "Log1n", "LOGIN", "Iog in" (common OCR errors)
  • Search for "Cancel" → Finds "Cancei", "CANCEL", "cancel" (character misrecognition)

find_ui_elements

Advanced UI element detection system that intelligently identifies interactive components using multiple detection strategies. Unlike text-only detection, this tool can accurately find buttons, text fields, dropdowns, and other UI elements even when they don't contain visible text. Perfect for modern applications with visual-only buttons, icons, and complex layouts.

Parameters:

  • autoSave (optional): Whether to save the screenshot for analysis (default: true)
  • elementTypes (optional): Array of specific element types to detect: ['button', 'text_field', 'link', 'image', 'icon', 'dialog', 'menu', 'window', 'checkbox', 'radio_button', 'dropdown', 'slider', 'tab', 'toolbar', 'list', 'table', 'scrollbar', 'other']
  • region (optional): Specific region to analyze with x, y, width, height coordinates

Detection Strategies:

  • Visual Analysis: Detects UI elements based on shape, color, and visual patterns
  • OCR Text Recognition: Identifies elements with text content and labels
  • Color Pattern Analysis: Recognizes macOS system colors and UI themes
  • Shape Detection: Finds rectangular buttons, rounded elements, and geometric patterns
  • Context Analysis: Groups related elements and understands spatial relationships

macOS-Specific Features:

  • Apple HIG Compliance: Optimized for Apple Human Interface Guidelines
  • System Color Recognition: Detects standard macOS button colors (#007AFF, #34C759, #FF3B30, etc.)
  • Touch Target Validation: Ensures elements meet minimum 44x44 pixel requirements
  • Native UI Patterns: Recognizes standard macOS dialogs, menus, and controls

Output Format: Returns comprehensive element information including:

  • Element type and subtype classification
  • Precise coordinates and clickable center points
  • Confidence scores and detection methods used
  • Visual features (colors, border radius, shadows)
  • Interactive validation results
  • Element state (enabled/disabled/selected)
  • Contextual relationships with nearby elements

Example Use Cases:

  • Find all clickable buttons in a dialog: elementTypes: ['button']
  • Detect form elements: elementTypes: ['text_field', 'button', 'dropdown']
  • Locate menu items: elementTypes: ['menu', 'link']
  • Find all interactive elements: (no elementTypes filter)

drag

Drag from one point to another using mouse button hold.

Parameters:

  • startX: Starting X coordinate
  • startY: Starting Y coordinate
  • endX: Ending X coordinate
  • endY: Ending Y coordinate
  • button: Mouse button to use for dragging (default: "left")

scroll

Scroll in any direction within the current window or a specific region.

Parameters:

  • direction: Direction to scroll ("up", "down", "left", "right")
  • amount: Number of scroll units (default: 5)
  • x (optional): X coordinate to scroll at (defaults to current mouse position)
  • y (optional): Y coordinate to scroll at (defaults to current mouse position)

hover

Hover the mouse at a specific position for a duration.

Parameters:

  • x: X coordinate to hover at
  • y: Y coordinate to hover at
  • duration: Duration to hover in milliseconds (default: 1000)

right_click

Right-click at specific coordinates to open context menus.

Parameters:

  • x: X coordinate to right-click
  • y: Y coordinate to right-click

list_screenshots

List all screenshots saved in the temporary folder.

No parameters required.

list_recent_screenshots

List recently captured screenshots with detailed metadata including timestamps, file sizes, and dimensions.

Parameters:

  • limit: Maximum number of screenshots to list (default: 10, max: 50)

view_screenshot

View/display a specific screenshot from the temporary folder.

Parameters:

  • filename: Name of the screenshot file to view

cleanup_screenshots

Clean up old screenshots from temporary folder, keeping only recent ones.

Parameters:

  • keepLast: Number of recent screenshots to keep (default: 10)

compare_screenshots

Compare two previously saved screenshots to identify differences and changes.

Parameters:

  • screenshot1: Filename of the first screenshot
  • screenshot2: Filename of the second screenshot

describe_screenshot

Capture and analyze a screenshot with AI-powered insights, combining OCR text extraction and UI element detection.

Parameters:

  • region (optional): Specific region to analyze with x, y, width, height coordinates
  • savePath (optional): Optional path to save the analyzed screenshot

performance_dashboard

Comprehensive performance monitoring dashboard providing real-time system health, metrics, and optimization recommendations.

Parameters:

  • includeMetrics (optional): Include detailed metrics in response (default: true)
  • includeRecommendations (optional): Include optimization recommendations (default: true)
  • includeHistory (optional): Include performance history and trends (default: false)
  • timeRangeMs (optional): Time range for trends in milliseconds (default: 1 hour)

🔧 OCR Configuration Options

The OCR system can be customized with various configuration options to optimize performance and accuracy for different use cases:

configureOCR(options)

Configure OCR settings globally for all text recognition operations.

Available Options:

  • minConfidence: Minimum confidence score for text recognition (default: 50, range: 0-100)
  • fuzzyMatchThreshold: Standard similarity threshold for fuzzy matching (default: 0.7, range: 0-1)
  • relaxedFuzzyThreshold: Fallback threshold for difficult text (default: 0.5, range: 0-1)
  • cacheEnabled: Enable/disable OCR result caching (default: true)
  • cacheTTL: Cache time-to-live in milliseconds (default: 30000)
  • maxCacheSize: Maximum number of cached results (default: 100)
  • timeoutMs: OCR operation timeout in milliseconds (default: 30000)

Example Configuration:

// For high-accuracy applications configureOCR({ minConfidence: 80, fuzzyMatchThreshold: 0.9, relaxedFuzzyThreshold: 0.7 }); // For fast processing with lower accuracy requirements configureOCR({ minConfidence: 30, fuzzyMatchThreshold: 0.6, relaxedFuzzyThreshold: 0.4, cacheTTL: 60000 // Longer cache for faster responses }); // For memory-constrained environments configureOCR({ cacheEnabled: false, maxCacheSize: 50 });

OCR Performance Features

Worker Pool Architecture:

  • Concurrent OCR processing with multiple worker threads
  • Automatic load balancing and task prioritization
  • Graceful fallback to single worker if pool initialization fails

Intelligent Caching:

  • Multi-level caching with image hash and region-based keys
  • Automatic cache cleanup and size management
  • Configurable TTL and cache size limits

Memory Management:

  • Automatic garbage collection triggers for large OCR operations
  • Memory usage monitoring and cleanup
  • Efficient image processing and buffer management

Error Handling:

  • Comprehensive error detection and recovery
  • Timeout protection for long-running OCR operations
  • Detailed error reporting with context information

📊 Implementation Status & Available Tools

✅ Fully Implemented Tools

Screenshot Management:

  • screenshot - Screen capture with region support and compression
  • list_screenshots - List all saved screenshots
  • list_recent_screenshots - List recent screenshots with metadata
  • view_screenshot - View specific screenshot files
  • cleanup_screenshots - Clean up old screenshot files
  • compare_screenshots - Compare two screenshots
  • describe_screenshot - AI-powered screenshot analysis

Mouse & Keyboard Control:

  • click - Click with multiple button support and verification
  • type_text - Text input with configurable delays
  • key_press - Key combinations and shortcuts
  • mouse_move - Move mouse cursor
  • drag - Drag and drop operations
  • scroll - Directional scrolling
  • hover - Mouse hover with duration
  • right_click - Context menu access

Window Management:

  • list_windows - List all open windows
  • get_active_window - Get current window info
  • find_window - Find window by title
  • focus_window - Bring window to front
  • get_window_info - Detailed window information

OCR & Text Recognition:

  • extract_text - OCR text extraction with caching
  • find_text - Locate text on screen with fuzzy matching
  • wait_for_element - Wait for text/elements to appear
  • find_ui_elements - Advanced visual UI element detection

System & Utilities:

  • get_screen_info - Screen dimensions
  • check_for_errors - Visual error detection
  • wait - Pause execution
  • diagnostic - System health check
  • performance_dashboard - Performance monitoring

🚧 Planned Features (Not Yet Implemented)

The following features mentioned in examples are planned for future releases:

  • click_hold - Click and hold operations
  • relative_mouse_move - Relative mouse positioning
  • key_hold - Hold keys for duration
  • type_with_delay - Human-like typing with variable delays and typos
  • Advanced smooth scrolling with easing
  • Pixel-perfect scrolling controls

🚀 Usage Examples

🎯 Basic Commands

Once configured, you can ask your AI assistant to:

Screenshots & Visual Inspection:

  • "Take a screenshot of my app" (metadata-only for fast responses)
  • "Capture just the top-left corner of the screen"
  • "Save a screenshot to ~/Desktop/app-screenshot.png"
  • "Take a screenshot and return the base64 data with 70% compression"
  • "Capture a region and return compressed image data for processing"

Mouse & Keyboard Control:

  • "Click the button at coordinates 100, 200"
  • "Double-click on the center of the screen"
  • "Type 'Hello World' in the current field"
  • "Press cmd+s to save the file"
  • "Press Enter to submit"

Window Management:

  • "List all open windows"
  • "Focus the Safari window"
  • "Get information about the active window"
  • "Find the window with 'Calculator' in the title"

Text Recognition & Search:

  • "Extract all text from the screen"
  • "Find the 'Submit' button on screen"
  • "Look for any text containing 'error' on screen"
  • "Read the text in the dialog box"

UI Element Detection:

  • "Find all clickable buttons on this screen"
  • "Detect text fields and form elements in this dialog"
  • "Locate all interactive elements (buttons, links, dropdowns)"
  • "Identify the toolbar and menu elements visually"
  • "Find UI elements by type: buttons, text fields, and checkboxes"

Error Detection:

  • "Check if there are any error dialogs on screen"
  • "Look for error messages in my app"
  • "Scan for any warning or error indicators"

🔧 Advanced Automation Examples

UI Testing Workflow:

"Please help me test my app: 1. Take a screenshot first 2. Click the 'Start' button 3. Wait 2 seconds 4. Check if any errors appeared 5. Take another screenshot to compare"

Bug Investigation:

"I'm having issues with my app: 1. Focus the MyApp window 2. Extract all visible text 3. Look for any error messages 4. Take a screenshot of the current state"

Automated Form Filling:

"Help me fill out this form: 1. Click at coordinates 300, 150 (username field) 2. Type 'testuser' 3. Press Tab to move to next field 4. Type 'password123' 5. Find and click the Submit button"

🚀 Advanced Automation Features

Drag and Drop Operations:

"Drag the file from the desktop to the trash: 1. Find the file icon at coordinates 100, 200 2. Drag it smoothly to the trash at 800, 600 over 2 seconds 3. Verify the file was moved"

Natural Scrolling:

"Scroll through the document naturally: 1. Smooth scroll down by 500 pixels 2. Wait 1 second 3. Scroll to find the text 'Chapter 3' 4. Hover over the heading for emphasis"

Text Input with Timing:

"Type this email with proper timing: 1. Click the compose button 2. Type the email address with 50ms delays 3. Press Tab to move to subject 4. Type 'Meeting Tomorrow' 5. Tab to body and type the message"

Drag and Drop Operations:

"Perform a drag operation: 1. Move to the start position 2. Drag from coordinates 100,200 to 300,400 3. Wait for the operation to complete 4. Verify the item was moved"

Keyboard Shortcuts:

"Use developer tools effectively: 1. Press cmd+shift+i to open inspector 2. Wait 2000ms for tools to load 3. Type 'console.log' in the console 4. Press Enter to execute"

Menu Navigation:

"Navigate the UI effectively: 1. Move mouse to coordinates 400, 200 2. Hover over the menu for 1 second 3. Click and wait for dropdown 4. Move to coordinates 400, 250 and click the option"

🎯 UI Element Detection Examples

Smart Button Detection:

"Find all clickable buttons in this dialog: 1. Use find_ui_elements to detect all interactive buttons 2. Identify the 'Cancel' and 'OK' buttons by their visual characteristics 3. Click the 'OK' button using the returned coordinates 4. Verify the action was successful"

Form Automation with Visual Detection:

"Help me fill out this form automatically: 1. Use find_ui_elements with elementTypes: ['text_field', 'button', 'dropdown'] 2. Locate the username text field (even if it has no label) 3. Click the field and enter 'john.doe@example.com' 4. Find the password field using visual detection 5. Enter the password and click the submit button"

Modern App UI Navigation:

"Navigate this modern app with icon-only buttons: 1. Use find_ui_elements to detect all clickable elements 2. Find buttons by their visual characteristics (not text) 3. Identify the settings gear icon using shape and color analysis 4. Click the settings button and verify the menu opens"

macOS Dialog Interaction:

"Handle this system dialog intelligently: 1. Use find_ui_elements to detect dialog structure 2. Identify the dialog type (alert, confirmation, file picker) 3. Find all available actions (buttons, checkboxes, dropdowns) 4. Choose the appropriate action based on the dialog context 5. Verify the dialog was dismissed correctly"

Complex Layout Analysis:

"Analyze this complex application layout: 1. Use find_ui_elements to map all UI components 2. Group related elements (toolbars, sidebars, content areas) 3. Identify interactive vs. static elements 4. Create a clickable element map for automation 5. Test clicking each interactive element"

Responsive UI Testing:

"Test this responsive interface across different states: 1. Take a screenshot and analyze initial UI state 2. Use find_ui_elements to detect all interactive components 3. Click elements to change the interface state 4. Re-analyze the UI to detect new/changed elements 5. Verify all states are working correctly"

Visual-Only Element Detection:

"Find elements without any text labels: 1. Use find_ui_elements with visual analysis only 2. Detect icon buttons, image buttons, and graphic controls 3. Identify element states (enabled/disabled/selected) 4. Test interactivity of visual-only elements 5. Map the visual UI structure for automation"

🚀 Performance Improvements

Mac Commander has been optimized for high-performance automation with significant improvements in memory usage, processing speed, and reliability:

Memory Optimization

  • 99% → 60-70% Memory Usage: Intelligent memory management and buffer optimization
  • Automatic Cleanup: Built-in garbage collection and memory throttling
  • Smart Buffering: Efficient image processing with minimal memory footprint

Processing Speed

  • 60-80% Faster Text Operations: Optimized OCR processing and text search algorithms
  • Chunked Image Processing: Large images are processed in efficient chunks
  • Parallel Processing: Multiple operations can run concurrently without blocking

Caching System

  • 30-70% Cache Hit Rates: Intelligent caching of frequently accessed screenshots
  • Smart Cache Management: Automatic cache invalidation and memory-conscious storage
  • Performance Monitoring: Real-time tracking of cache effectiveness

Request Batching

  • Optimized Concurrency: Multiple simultaneous requests are handled efficiently
  • Resource Throttling: Prevents system overload during intensive operations
  • Performance Metrics: Built-in monitoring of operation timings and resource usage

These improvements make Mac Commander suitable for intensive automation tasks and long-running operations without performance degradation.

Development

  • Run in development mode: npm run dev
  • Test with MCP Inspector: npm run inspector

⚠️ Limitations & Troubleshooting

Known Limitations

  • OCR Accuracy: Text recognition depends on font size, contrast, and clarity
  • Permission Requirements: Must manually grant Screen Recording and Accessibility permissions
  • First OCR Run: Initial text extraction may be slower due to model loading
  • macOS Only: This server only works on macOS systems

🐛 Common Issues

"Permission denied" or "Screen recording not allowed"

  • ✅ Grant Screen Recording permission to your AI client
  • ✅ Grant Accessibility permission to your AI client
  • 🔄 Restart your AI client after granting permissions

"Command not found" or "Cannot find module"

  • ✅ Make sure you ran npm install and npm run build
  • ✅ Use the absolute path to build/index.js in your config
  • ✅ Verify Node.js is installed: node --version

"MCP server not showing up"

  • ✅ Check your configuration JSON syntax is valid
  • ✅ Restart your AI client completely
  • ✅ Try the test script: node test-server.js

"Screenshots are black or empty"

  • ✅ Grant Screen Recording permission
  • ✅ Make sure the app you're screenshotting is visible (not minimized)

🆘 Getting Help

If you're still having issues:

  1. Run the test script: node test-server.js to verify basic functionality
  2. Check the console: Look for error messages in your AI client
  3. Open an issue: Create a GitHub issue with:
    • Your macOS version
    • Your AI client (Claude Desktop, Cursor, etc.)
    • The exact error message
    • Your configuration file (with paths anonymized)

🔒 Security & Privacy

Important Security Notes

⚠️ This server has powerful capabilities and requires significant system permissions.

What this server can access:

  • Screen content: Can take screenshots of anything visible
  • Keyboard input: Can type any text or key combinations
  • Mouse control: Can click anywhere on screen
  • Window information: Can see and control application windows
  • Text recognition: Can read any text visible on screen

Security best practices:

  • 🏠 Only use in trusted environments: Don't use on shared or public computers
  • 🤝 Review AI requests: Be mindful of what you ask the AI to do
  • 🔐 Sensitive data: Avoid using when sensitive information is visible
  • 🚫 Revoke access: You can remove permissions anytime in System Settings

Privacy Notes

  • No data is sent externally by this MCP server itself
  • Your AI client (Claude Desktop, etc.) may process screenshots/data according to their privacy policies
  • Screenshots are temporary and not permanently stored unless you specify a save path
  • OCR processing happens locally on your machine

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

📄 License

MIT License - see LICENSE file for details.

📚 Additional Resources

📖 Complete Documentation Suite

  • DOCS.md - Complete documentation index and navigation guide
  • API.md - Comprehensive API reference and technical documentation
  • PERFORMANCE.md - Performance optimization and tuning guide
  • MIGRATION.md - Migration guide and version changes

🛠️ Development Resources

💬 Community and Support

🙏 Acknowledgments


Made with ❤️ for the MCP community

Having issues? Open a GitHub issue • Want to contribute? Check our contributing guide

-
security - not tested
F
license - not found
-
quality - not tested

local-only server

The server can only run on the client's local machine because it depends on local resources.

An MCP server that allows AI tools like Claude Desktop, Claude Code, and Cursor to visually interact with macOS applications by capturing screenshots and controlling the mouse and keyboard.

  1. 🚀 Quick Start
    1. Option 1: Automated Install (Easiest)
    2. Option 2: Manual Install
  2. 📚 Table of Contents
    1. ✨ Features
      1. Core Features
      2. Advanced UI Element Detection (New!)
      3. Advanced Automation Features (New!)
      4. Performance Features (New!)
    2. 🛠️ Prerequisites
      1. System Requirements
      2. Required macOS Permissions
    3. 📦 Installation
      1. 💿 Automated Installation
      2. 🔧 Manual Installation
      3. Option 2: Global Install
      4. 🔧 Verify Installation
    4. ⚙️ Configuration
      1. 🖥️ Claude Desktop Setup
      2. 💻 Claude Code Setup
      3. 🎯 Cursor Setup
      4. 🔍 Finding Your Full Path
      5. ✅ Verify It's Working
    5. 📖 Tool Parameter Reference
      1. screenshot
      2. click
      3. type_text
      4. mouse_move
      5. key_press
      6. check_for_errors
      7. wait
      8. wait_for_element
      9. get_screen_info
      10. list_windows
      11. get_active_window
      12. find_window
      13. focus_window
      14. get_window_info
      15. extract_text
      16. find_text
      17. find_ui_elements
      18. drag
      19. scroll
      20. hover
      21. right_click
      22. list_screenshots
      23. list_recent_screenshots
      24. view_screenshot
      25. cleanup_screenshots
      26. compare_screenshots
      27. describe_screenshot
      28. performance_dashboard
    6. 🔧 OCR Configuration Options
      1. configureOCR(options)
      2. OCR Performance Features
    7. 📊 Implementation Status & Available Tools
      1. ✅ Fully Implemented Tools
      2. 🚧 Planned Features (Not Yet Implemented)
    8. 🚀 Usage Examples
      1. 🎯 Basic Commands
      2. 🔧 Advanced Automation Examples
      3. 🚀 Advanced Automation Features
      4. 🎯 UI Element Detection Examples
    9. 🚀 Performance Improvements
      1. Memory Optimization
      2. Processing Speed
      3. Caching System
      4. Request Batching
    10. Development
      1. ⚠️ Limitations & Troubleshooting
        1. Known Limitations
        2. 🐛 Common Issues
        3. 🆘 Getting Help
      2. 🔒 Security & Privacy
        1. Important Security Notes
        2. Privacy Notes
      3. 🤝 Contributing
        1. 📄 License
          1. 📚 Additional Resources
            1. 📖 Complete Documentation Suite
            2. 🔗 Related Links
            3. 🛠️ Development Resources
            4. 💬 Community and Support
          2. 🙏 Acknowledgments

            Related MCP Servers

            • A
              security
              A
              license
              A
              quality
              An MCP server that implements Claude Code-like functionality, allowing the AI to analyze codebases, modify files, execute commands, and manage projects through direct file system interactions.
              Last updated -
              15
              265
              MIT License
              • Apple
              • Linux
            • -
              security
              F
              license
              -
              quality
              An MCP server that allows AI assistants like Claude to execute terminal commands on the user's computer and return the output, functioning like a terminal through AI.
              Last updated -
              57
              • Apple
            • -
              security
              A
              license
              -
              quality
              An MCP server that enables AI assistants like Claude to access and manipulate Apple Notes on macOS, allowing for retrieving, creating, and managing notes through natural language interactions.
              Last updated -
              75
              MIT License
              • Apple
            • A
              security
              A
              license
              A
              quality
              An MCP server that allows AI assistants like Claude Code, Claude Desktop, and Cursor to interact with Things.app on macOS, enabling task creation, updates, viewing, scheduling, and organization through natural language.
              Last updated -
              6
              23
              2
              MIT License
              • Apple

            View all related MCP servers

            MCP directory API

            We provide all the information about MCP servers via our MCP API.

            curl -X GET 'https://glama.ai/api/mcp/v1/servers/ohqay/macos-simulator-mcp'

            If you have feedback or need assistance with the MCP directory API, please join our Discord server