README.md•4.13 kB
# Qwen3-Coder MCP Server for Claude Code
This setup integrates Qwen3-Coder (30B parameter model) with Claude Code via the Model Context Protocol (MCP), optimized for 64GB RAM systems.
## Features
- **Qwen3-Coder 30B**: Latest and most powerful Qwen Coder model with exceptional coding capabilities
- **64GB RAM Optimized**: Configuration tuned for maximum performance on high-memory systems
- **MCP Integration**: Seamless integration with Claude Code through 5 specialized tools
- **Advanced Settings**: Flash attention, optimized KV cache, and parallel processing
## Optimization Settings
The setup includes these optimizations for your 64GB RAM:
- `OLLAMA_NUM_PARALLEL=8`: Handle 8 parallel requests
- `OLLAMA_MAX_LOADED_MODELS=4`: Keep 4 models in memory simultaneously
- `OLLAMA_FLASH_ATTENTION=1`: Enable efficient attention mechanism
- `OLLAMA_KV_CACHE_TYPE=q8_0`: High-quality 8-bit cache
- `OLLAMA_KEEP_ALIVE=24h`: Keep models loaded for 24 hours
## Available Tools
### 1. `qwen3_code_review`
Reviews code for quality, bugs, and best practices.
**Parameters:**
- `code` (required): The code to review
- `language` (optional): Programming language
### 2. `qwen3_code_explain`
Provides detailed explanations of how code works.
**Parameters:**
- `code` (required): The code to explain
- `language` (optional): Programming language
### 3. `qwen3_code_generate`
Generates new code based on requirements.
**Parameters:**
- `prompt` (required): Description of what to generate
- `language` (optional): Target programming language
### 4. `qwen3_code_fix`
Fixes bugs and issues in existing code.
**Parameters:**
- `code` (required): The buggy code
- `error` (optional): Error message or description
- `language` (optional): Programming language
### 5. `qwen3_code_optimize`
Optimizes code for performance, memory, or readability.
**Parameters:**
- `code` (required): The code to optimize
- `criteria` (optional): Optimization criteria
- `language` (optional): Programming language
## Quick Start
### 1. Start the Optimized Server
```bash
cd /Users/keith/qwencoder
./start-qwen3-optimized.sh
```
### 2. Restart Claude Code
Close and reopen Claude Code to load the MCP server configuration.
### 3. Use in Claude Code
The tools will be automatically available in your Claude Code sessions. You can use them by referencing the tool names in your conversations.
## Manual Commands
### Start Ollama with optimizations:
```bash
OLLAMA_NUM_PARALLEL=8 OLLAMA_MAX_LOADED_MODELS=4 OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q8_0 ollama serve
```
### Test the model directly:
```bash
ollama run qwen3-coder:30b "Write a Python function to calculate factorial"
```
### Test the MCP server:
```bash
node qwen3-mcp-server.js
```
## Troubleshooting
### If Claude Code doesn't see the MCP server:
1. Check that the config.json has the correct path
2. Restart Claude Code completely
3. Verify Ollama is running: `ollama list`
### If the model is slow:
1. Ensure you have enough RAM available
2. Check that OLLAMA_FLASH_ATTENTION=1 is set
3. Monitor system resources with Activity Monitor
### If tools aren't working:
1. Test Ollama directly: `ollama run qwen3-coder:30b "test"`
2. Check MCP server logs in Console.app
3. Verify the Node.js dependencies are installed
## Files Structure
```
/Users/keith/qwencoder/
├── qwen3-mcp-server.js # MCP server implementation
├── package.json # Node.js dependencies
├── start-qwen3-optimized.sh # Optimized startup script
└── README.md # This file
```
## Configuration Files
- **Claude Config**: `/Users/keith/Library/Application Support/Claude/config.json`
- **MCP Server**: `/Users/keith/qwencoder/qwen3-mcp-server.js`
## Performance Notes
With 64GB RAM, you can:
- Keep multiple large models loaded simultaneously
- Handle numerous parallel requests
- Use high-quality cache settings for better performance
- Run for extended periods without memory issues
The Qwen3-Coder 30B model uses approximately 18GB of RAM when loaded, leaving plenty of room for other applications and additional models.