Skip to main content
Glama
ANSIBLE_COLLECTION_FIXES.md10.7 kB
# Ansible Collection Fixes Required for Connection Reuse ## Problem Summary The `tosin2013.mcp_audit` collection fails on subsequent tool calls when running multiple modules sequentially in a playbook. The error is: "Failed to connect to MCP server via stdio: unhandled errors in a TaskGroup (1 sub-exception)" ## Root Cause Analysis ### Current Implementation Issues 1. **New Process Per Module Call**: Each Ansible module creates a **new stdio connection** which spawns a **new server process**: ```python # In mcp_test_tool.py line ~228 async with client.connect(): result = await client.call_tool(tool_name, tool_arguments) ``` 2. **No Connection Reuse**: The `connect()` context manager: - Creates a new `stdio_client()` which spawns a new server process - Initializes a new `ClientSession` - Closes everything when the context exits - **No mechanism to reuse connections across module calls** 3. **Process Cleanup Race Condition**: When modules run sequentially: - Module 1: Spawns server process, connects, works, exits (process may still be cleaning up) - Module 2: Tries to spawn new server process immediately, but: - Previous process might still be running - Process cleanup might interfere - Resource contention on stdio streams ### Evidence from Code **File**: `plugins/module_utils/mcp_client.py` (lines 123-152) ```python @asynccontextmanager async def connect(self): if self.transport == "stdio": server_params = StdioServerParameters( command=self.server_command, args=self.server_args, ) async with stdio_client(server_params) as (read, write): # NEW PROCESS EACH TIME async with ClientSession(read, write) as session: await session.initialize() yield self # Connection closes when context exits ``` **File**: `plugins/modules/mcp_test_tool.py` (line ~228) ```python async def test_tool_async(client: MCPClient, tool_name: str, ...): async with client.connect(): # New connection for EVERY tool call result = await client.call_tool(tool_name, tool_arguments) ``` ## Required Fixes ### Fix 1: Connection Pooling/Reuse (Recommended) **Problem**: Each module call creates a new server process. **Solution**: Implement connection pooling with process reuse. **Changes Needed**: 1. **Create Connection Manager Singleton** (`plugins/module_utils/mcp_connection_manager.py`): ```python import asyncio import hashlib from typing import Dict, Optional from ansible_collections.mcp.audit.plugins.module_utils.mcp_client import MCPClient class MCPConnectionManager: """Manages connection pooling for MCP servers.""" _instance: Optional['MCPConnectionManager'] = None _connections: Dict[str, 'MCPClient'] = {} _locks: Dict[str, asyncio.Lock] = {} def __new__(cls): if cls._instance is None: cls._instance = super().__new__(cls) return cls._instance def _get_connection_key(self, client: MCPClient) -> str: """Generate unique key for connection based on server parameters.""" if client.transport == "stdio": key_data = f"{client.transport}:{client.server_command}:{':'.join(client.server_args)}" else: key_data = f"{client.transport}:{client.server_url}" return hashlib.md5(key_data.encode()).hexdigest() async def get_or_create_connection(self, client: MCPClient) -> 'MCPClient': """Get existing connection or create new one.""" key = self._get_connection_key(client) if key not in self._locks: self._locks[key] = asyncio.Lock() async with self._locks[key]: if key not in self._connections: # Create persistent connection connection = await client.connect_persistent() self._connections[key] = connection return self._connections[key] async def close_connection(self, client: MCPClient): """Close a specific connection.""" key = self._get_connection_key(client) if key in self._connections: await self._connections[key].close() del self._connections[key] async def close_all(self): """Close all connections.""" for connection in self._connections.values(): await connection.close() self._connections.clear() ``` 2. **Add Persistent Connection Method** to `MCPClient`: ```python # In plugins/module_utils/mcp_client.py async def connect_persistent(self): """Establish persistent connection (doesn't use context manager).""" if self.transport == "stdio": server_params = StdioServerParameters( command=self.server_command, args=self.server_args, ) # Store streams without context manager self._stdio_context = stdio_client(server_params) read, write = await self._stdio_context.__aenter__() self._read_stream = read self._write_stream = write self._session_context = ClientSession(read, write) self.session = await self._session_context.__aenter__() await self.session.initialize() return self # ... handle other transports async def close(self): """Close persistent connection.""" if self.session: await self._session_context.__aexit__(None, None, None) self.session = None if self._stdio_context: await self._stdio_context.__aexit__(None, None, None) self._stdio_context = None ``` 3. **Update Modules to Use Connection Manager**: ```python # In plugins/modules/mcp_test_tool.py from ansible_collections.mcp.audit.plugins.module_utils.mcp_connection_manager import MCPConnectionManager async def test_tool_async(client: MCPClient, tool_name: str, ...): manager = MCPConnectionManager() persistent_client = await manager.get_or_create_connection(client) # Use persistent connection instead of creating new one result = await persistent_client.call_tool(tool_name, tool_arguments) return result ``` ### Fix 2: Process Cleanup with Delays (Alternative) **Problem**: Processes not cleaned up properly between calls. **Solution**: Add proper cleanup and delays. **Changes Needed**: ```python # In plugins/module_utils/mcp_client.py @asynccontextmanager async def connect(self): try: if self.transport == "stdio": server_params = StdioServerParameters( command=self.server_command, args=self.server_args, ) async with stdio_client(server_params) as (read, write): self._read_stream = read self._write_stream = write async with ClientSession(read, write) as session: self.session = session await session.initialize() yield self finally: # Ensure proper cleanup if self.transport == "stdio": # Add small delay to allow process cleanup await asyncio.sleep(0.1) # Force cleanup of any remaining processes import gc gc.collect() ``` ### Fix 3: Better Error Handling for TaskGroup (Workaround) **Problem**: TaskGroup errors not properly handled. **Solution**: Add better error handling and retry logic. **Changes Needed**: ```python # In plugins/modules/mcp_test_tool.py async def test_tool_async(client: MCPClient, tool_name: str, ...): max_retries = 3 retry_delay = 0.5 for attempt in range(max_retries): try: async with client.connect(): result = await client.call_tool(tool_name, tool_arguments) return result except Exception as e: if "TaskGroup" in str(e) and attempt < max_retries - 1: # Wait before retry await asyncio.sleep(retry_delay * (attempt + 1)) continue raise ``` ## Recommended Implementation Strategy ### Phase 1: Immediate Fix (Quick) - Implement Fix 3 (better error handling + retry logic) - Add process cleanup delays - **Timeline**: 1-2 days ### Phase 2: Proper Fix (Long-term) - Implement Fix 1 (connection pooling) - Add connection lifecycle management - Add configuration option for connection reuse - **Timeline**: 1-2 weeks ## Configuration Options Needed Add these options to modules: ```python connection_reuse: description: Whether to reuse connections across module calls type: bool default: true connection_timeout: description: Timeout for connection reuse type: int default: 300 # 5 minutes max_connection_retries: description: Maximum retries for connection failures type: int default: 3 ``` ## Testing Requirements 1. **Sequential Module Calls**: Test multiple modules in same playbook 2. **Different Servers**: Test with different MCP server commands 3. **Connection Cleanup**: Verify processes are cleaned up properly 4. **Error Recovery**: Test retry logic and error handling 5. **Performance**: Measure connection overhead with/without pooling ## Compatibility Considerations Since this collection needs to work with **many projects**, ensure: 1. **Backward Compatibility**: Old behavior still works (connection reuse optional) 2. **Transport Agnostic**: Fixes work for stdio, SSE, and HTTP 3. **Resource Management**: Proper cleanup prevents resource leaks 4. **Error Messages**: Clear errors when connection fails ## Example Fixed Module Usage ```yaml # Before (fails on second call) - name: Test tool 1 tosin2013.mcp_audit.mcp_test_tool: tool_name: analyze_project_ecosystem # ... fails on next call - name: Test tool 2 tosin2013.mcp_audit.mcp_test_tool: tool_name: read_file # Fails with connection error # After (with connection pooling) - name: Test tool 1 tosin2013.mcp_audit.mcp_test_tool: tool_name: analyze_project_ecosystem connection_reuse: true # NEW OPTION - name: Test tool 2 tosin2013.mcp_audit.mcp_test_tool: tool_name: read_file connection_reuse: true # Reuses connection from tool 1 # Works correctly! ``` ## Summary **Critical Fix**: Implement connection pooling/reuse to prevent spawning new server processes for each module call. **Priority**: HIGH - This affects usability across all projects using the collection. **Complexity**: Medium - Requires async connection management and lifecycle handling. **Impact**: Fixes connection failures for sequential tool calls, enabling comprehensive test suites.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tosin2013/mcp-adr-analysis-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server