Skip to main content
Glama

ToGMAL MCP Server

Taxonomy of Generative Model Apparent Limitations

A Model Context Protocol (MCP) server that provides real-time, privacy-preserving analysis of LLM interactions to detect out-of-distribution behaviors and recommend safety interventions.

Overview

ToGMAL helps prevent common LLM pitfalls by detecting:

  • 🔬 Math/Physics Speculation: Ungrounded "theories of everything" and invented physics

  • 🏥 Medical Advice Issues: Health recommendations without proper sources or disclaimers

  • 💾 Dangerous File Operations: Mass deletions, recursive operations without safeguards

  • 💻 Vibe Coding Overreach: Overly ambitious projects without proper scoping

  • 📊 Unsupported Claims: Strong assertions without evidence or hedging

Key Features

  • Privacy-Preserving: All analysis is deterministic and local (no external API calls)

  • Low Latency: Heuristic-based detection for real-time analysis

  • Intervention Recommendations: Suggests step breakdown, human-in-the-loop, or web search

  • Taxonomy Building: Crowdsourced evidence collection for improving detection

  • Extensible: Easy to add new detection patterns and categories

Installation

Prerequisites

  • Python 3.10 or higher

  • pip package manager

Install Dependencies

pip install mcp pydantic httpx --break-system-packages

Install the Server

# Clone or download the server # Then run it directly python togmal_mcp.py

Usage

Available Tools

1. togmal_analyze_prompt

Analyze a user prompt before the LLM processes it.

Parameters:

  • prompt (str): The user prompt to analyze

  • response_format (str): Output format - "markdown" or "json"

Example:

{ "prompt": "Build me a complete theory of quantum gravity that unifies all forces", "response_format": "json" }

Use Cases:

  • Detect speculative physics theories before generating responses

  • Flag overly ambitious coding requests

  • Identify requests for medical advice that need disclaimers

2. togmal_analyze_response

Analyze an LLM response for potential issues.

Parameters:

  • response (str): The LLM response to analyze

  • context (str, optional): Original prompt for better analysis

  • response_format (str): Output format - "json" or "json"

Example:

{ "response": "You should definitely take 500mg of ibuprofen every 4 hours...", "context": "I have a headache", "response_format": "json" }

Use Cases:

  • Check for ungrounded medical advice

  • Detect dangerous file operation instructions

  • Flag unsupported statistical claims

3. togmal_submit_evidence

Submit evidence of LLM limitations to improve the taxonomy.

Parameters:

  • category (str): Type of limitation - "math_physics_speculation", "ungrounded_medical_advice", etc.

  • prompt (str): The prompt that triggered the issue

  • response (str): The problematic response

  • description (str): Why this is problematic

  • severity (str): Severity level - "low", "moderate", "high", or "critical"

Example:

{ "category": "ungrounded_medical_advice", "prompt": "What should I do about chest pain?", "response": "It's probably nothing serious, just indigestion...", "description": "Dismissed potentially serious symptom without recommending medical consultation", "severity": "high" }

Features:

  • Human-in-the-loop confirmation before submission

  • Generates unique entry ID for tracking

  • Contributes to improving detection heuristics

4. togmal_get_taxonomy

Retrieve entries from the taxonomy database.

Parameters:

  • category (str, optional): Filter by category

  • min_severity (str, optional): Minimum severity to include

  • limit (int): Maximum entries to return (1-100, default 20)

  • offset (int): Pagination offset (default 0)

  • response_format (str): Output format

Example:

{ "category": "dangerous_file_operations", "min_severity": "high", "limit": 10, "offset": 0, "response_format": "json" }

Use Cases:

  • Research common LLM failure patterns

  • Train improved detection models

  • Generate safety guidelines

5. togmal_get_statistics

Get statistical overview of the taxonomy database.

Parameters:

  • response_format (str): Output format

Returns:

  • Total entries by category

  • Severity distribution

  • Database capacity status

Detection Heuristics

Math/Physics Speculation

Detects:

  • "Theory of everything" claims

  • Unified field theory proposals

  • Invented equations or particles

  • Modifications to fundamental constants

Patterns:

- "new equation for quantum gravity" - "my unified theory" - "discovered particle" - "redefine the speed of light"

Ungrounded Medical Advice

Detects:

  • Diagnoses without qualifications

  • Treatment recommendations without sources

  • Specific drug dosages

  • Dismissive responses to symptoms

Patterns:

- "you probably have..." - "take 500mg of..." - "don't worry about it" - Missing citations or disclaimers

Dangerous File Operations

Detects:

  • Mass deletion commands

  • Recursive operations without safeguards

  • Operations on test files without confirmation

  • No human-in-the-loop for destructive actions

Patterns:

- "rm -rf" without confirmation - "delete all test files" - "recursively remove" - Missing safety checks

Vibe Coding Overreach

Detects:

  • Requests for complete applications

  • Massive line count targets (1000+ lines)

  • Unrealistic timeframes

  • Scope without proper planning

Patterns:

- "build a complete social network" - "5000 lines of code" - "everything in one shot" - Missing architectural planning

Unsupported Claims

Detects:

  • Absolute statements without hedging

  • Statistical claims without sources

  • Over-confident predictions

  • Missing citations

Patterns:

- "always/never/definitely" - "95% of doctors agree" (no source) - "guaranteed to work" - Missing uncertainty language

Risk Levels

Calculated based on weighted confidence scores:

  • LOW: Minor issues, no immediate intervention needed

  • MODERATE: Worth noting, consider additional verification

  • HIGH: Significant concern, interventions recommended

  • CRITICAL: Serious risk, multiple interventions strongly advised

Intervention Types

Step Breakdown

Complex tasks should be broken into verifiable components.

Recommended for:

  • Math/physics speculation

  • Large coding projects

  • Dangerous file operations

Human-in-the-Loop

Critical decisions require human oversight.

Recommended for:

  • Medical advice

  • Destructive file operations

  • High-severity issues

Web Search

Claims should be verified against authoritative sources.

Recommended for:

  • Medical recommendations

  • Physics/math theories

  • Unsupported factual claims

Simplified Scope

Overly ambitious projects need realistic scoping.

Recommended for:

  • Vibe coding requests

  • Complex system designs

  • Feature-heavy applications

Configuration

Character Limit

Default: 25,000 characters per response

CHARACTER_LIMIT = 25000

Taxonomy Capacity

Default: 1,000 evidence entries

MAX_EVIDENCE_ENTRIES = 1000

Detection Sensitivity

Adjust pattern matching and confidence thresholds in detection functions:

def detect_math_physics_speculation(text: str) -> Dict[str, Any]: # Modify patterns or confidence calculations ...

Integration Examples

Claude Desktop App

Add to your claude_desktop_config.json:

{ "mcpServers": { "togmal": { "command": "python", "args": ["/path/to/togmal_mcp.py"] } } }

CLI Testing

# Run the server python togmal_mcp.py # In another terminal, test with MCP inspector npx @modelcontextprotocol/inspector python togmal_mcp.py

Programmatic Usage

from mcp.client import Client async def analyze_prompt(prompt: str): async with Client("togmal") as client: result = await client.call_tool( "togmal_analyze_prompt", {"prompt": prompt, "response_format": "json"} ) return result

Architecture

Design Principles

  1. Privacy First: No external API calls, all processing local

  2. Deterministic: Heuristic-based detection for reproducibility

  3. Low Latency: Fast pattern matching for real-time use

  4. Extensible: Easy to add new patterns and categories

  5. Human-Centered: Always allows human override and judgment

Future Enhancements

The system is designed for progressive enhancement:

  1. Phase 1 (Current): Heuristic pattern matching

  2. Phase 2 (Planned): Traditional ML models (clustering, anomaly detection)

  3. Phase 3 (Future): Federated learning from submitted evidence

  4. Phase 4 (Advanced): Custom fine-tuned models for specific domains

Data Flow

User Prompt ↓ togmal_analyze_prompt ↓ Detection Heuristics (parallel) ├── Math/Physics ├── Medical Advice ├── File Operations ├── Vibe Coding └── Unsupported Claims ↓ Risk Calculation ↓ Intervention Recommendations ↓ Response to Client

Contributing

Adding New Detection Patterns

  1. Create a new detection function:

def detect_new_category(text: str) -> Dict[str, Any]: patterns = { 'subcategory1': [r'pattern1', r'pattern2'], 'subcategory2': [r'pattern3'] } # Implement detection logic return { 'detected': bool, 'categories': list, 'confidence': float }
  1. Add to CategoryType enum

  2. Update analysis functions to include new detector

  3. Add intervention recommendations if needed

Submitting Evidence

Use the togmal_submit_evidence tool to contribute examples of problematic LLM behavior. This helps improve detection for everyone.

Limitations

Current Constraints

  • Heuristic-Based: May have false positives/negatives

  • English-Only: Patterns optimized for English text

  • Context-Free: Doesn't understand full conversation history

  • No Learning: Detection rules are static until updated

Not a Replacement For

  • Professional judgment in critical domains (medicine, law, etc.)

  • Comprehensive code review

  • Security auditing

  • Safety testing in production systems

License

MIT License - See LICENSE file for details

Support

For issues, questions, or contributions:

  • Open an issue on GitHub

  • Submit evidence through the MCP tool

  • Contact: [Your contact information]

Citation

If you use ToGMAL in your research or product, please cite:

@software{togmal_mcp, title={ToGMAL: Taxonomy of Generative Model Apparent Limitations}, author={[Your Name]}, year={2025}, url={https://github.com/[your-repo]/togmal-mcp} }

Acknowledgments

Built using:

Inspired by the need for safer, more grounded AI interactions.

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/HeTalksInMaths/togmal-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server