Skip to main content
Glama
metrics.mdโ€ข7.4 kB
# Metrics & Monitoring The TrueNAS MCP Server includes comprehensive metrics collection and monitoring capabilities. ## Overview Features: - **Prometheus-compatible** metrics export - **Counters, Gauges, and Histograms** - **Automatic request tracking** - **Custom metrics support** - **Performance monitoring** ## Configuration Enable metrics: ```env ENABLE_METRICS=true METRICS_PORT=9090 ``` ## Available Metrics ### Request Metrics ``` # HTTP request counter with labels http_requests_total{endpoint="/api/v2.0/pool",method="GET",status="200"} 1523 # Request duration histogram http_request_duration_seconds_bucket{endpoint="/api/v2.0/pool",le="0.5"} 1450 http_request_duration_seconds_sum{endpoint="/api/v2.0/pool"} 234.5 http_request_duration_seconds_count{endpoint="/api/v2.0/pool"} 1523 # Error counter http_errors_total{endpoint="/api/v2.0/pool",method="GET",status="500"} 12 ``` ### Cache Metrics ``` # Cache hits cache_hits_total{namespace="pools"} 1250 # Cache misses cache_misses_total{namespace="pools"} 320 # Cache size gauge cache_size 305 ``` ### Rate Limit Metrics ``` # Rate limit checks rate_limit_checks_total{key="default",allowed="true"} 980 rate_limit_checks_total{key="default",allowed="false"} 20 ``` ### System Metrics ``` # Active connections active_connections 15 # Process uptime process_uptime_seconds 3456.78 ``` ## Using Metrics ### Automatic Tracking Metrics are automatically collected for: - HTTP requests - Cache operations - Rate limit checks ### Manual Metrics Use the metrics collector directly: ```python from truenas_mcp_server.metrics import get_metrics_collector metrics = get_metrics_collector() # Increment counter metrics.counter("custom_operations_total").inc() # Set gauge metrics.gauge("queue_size").set(42) # Observe histogram value metrics.histogram("operation_duration_seconds").observe(1.23) ``` ### Decorators Track function execution automatically: ```python from truenas_mcp_server.metrics import track_time, track_counter, track_errors @track_time() async def slow_operation(): # Automatically tracked in histogram pass @track_counter("user_creations_total") async def create_user(): # Counter incremented on each call pass @track_errors() async def risky_operation(): # Tracks successes and errors separately pass ``` ## Prometheus Integration ### Scrape Configuration Add to your `prometheus.yml`: ```yaml scrape_configs: - job_name: 'truenas_mcp' static_configs: - targets: ['localhost:9090'] scrape_interval: 15s ``` ### Example Queries **Request rate:** ```promql rate(http_requests_total[5m]) ``` **Average request duration:** ```promql rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]) ``` **Cache hit rate:** ```promql rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) * 100 ``` **Error rate:** ```promql rate(http_errors_total[5m]) ``` **P95 latency:** ```promql histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) ``` ## Grafana Dashboard Example dashboard panels: ### Request Rate ```promql sum(rate(http_requests_total[5m])) by (endpoint) ``` ### Error Rate ```promql sum(rate(http_errors_total[5m])) by (status) ``` ### Response Time (P50, P95, P99) ```promql histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m])) histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) ``` ### Cache Performance ```promql rate(cache_hits_total[5m]) rate(cache_misses_total[5m]) ``` ## Custom Metrics ### Creating Custom Metrics ```python from truenas_mcp_server.metrics import get_metrics_collector metrics = get_metrics_collector() # Counter for tracking events operations_counter = metrics.counter( "dataset_operations_total", labels={"operation": "create", "pool": "tank"} ) operations_counter.inc() # Gauge for current state storage_gauge = metrics.gauge( "storage_used_bytes", labels={"pool": "tank"} ) storage_gauge.set(1024 * 1024 * 1024 * 500) # 500GB # Histogram for distributions size_histogram = metrics.histogram( "dataset_size_bytes", labels={"pool": "tank"} ) size_histogram.observe(1024 * 1024 * 100) # 100MB ``` ### Labels Add labels for better filtering: ```python # Track operations per pool for pool in ["tank", "backup"]: metrics.counter("pool_operations_total", labels={"pool": pool}).inc() # Track by user metrics.counter("user_actions_total", labels={"user": "admin", "action": "create"}).inc() ``` ## Exporting Metrics ### Prometheus Format ```python metrics = get_metrics_collector() prometheus_text = metrics.export_prometheus() print(prometheus_text) ``` Output: ``` # TYPE http_requests_total counter http_requests_total{endpoint="/api/v2.0/pool",method="GET",status="200"} 1523 # TYPE http_request_duration_seconds histogram http_request_duration_seconds_bucket{endpoint="/api/v2.0/pool",le="0.5"} 1450 http_request_duration_seconds_sum{endpoint="/api/v2.0/pool"} 234.5 http_request_duration_seconds_count{endpoint="/api/v2.0/pool"} 1523 ``` ### JSON Format ```python metrics_dict = metrics.get_all_metrics() print(json.dumps(metrics_dict, indent=2)) ``` ## Best Practices ### 1. Use Appropriate Metric Types - **Counter**: Monotonically increasing (requests, errors) - **Gauge**: Can go up/down (temperature, queue size) - **Histogram**: Track distributions (latency, size) ### 2. Add Meaningful Labels ```python # Good: Specific labels metrics.counter("api_calls", labels={"endpoint": "/pool", "method": "GET"}) # Bad: Too generic metrics.counter("calls") ``` ### 3. Don't Over-Label Avoid high-cardinality labels (user IDs, timestamps): ```python # Bad: Creates millions of unique metrics metrics.counter("requests", labels={"user_id": user_id}) # Good: Use grouping metrics.counter("requests", labels={"user_type": "admin"}) ``` ### 4. Reset Periodically Reset metrics when needed: ```python metrics = get_metrics_collector() metrics.reset_all() ``` ## Alerting Example Prometheus alerts: ```yaml groups: - name: truenas_mcp rules: - alert: HighErrorRate expr: rate(http_errors_total[5m]) > 0.1 for: 5m annotations: summary: "High error rate detected" - alert: SlowRequests expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2 for: 5m annotations: summary: "95th percentile latency > 2s" - alert: LowCacheHitRate expr: rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.5 for: 10m annotations: summary: "Cache hit rate below 50%" ``` ## Troubleshooting ### Metrics Not Appearing Check if metrics are enabled: ```python from truenas_mcp_server.config import get_settings settings = get_settings() print(f"Metrics enabled: {settings.enable_metrics}") ``` ### High Memory Usage Reduce metric retention or use aggregation: ```python # Reset metrics periodically metrics.reset_all() ``` ### Missing Labels Ensure labels are consistent: ```python # All calls must use same label keys metrics.counter("requests", labels={"endpoint": "/pool", "method": "GET"}) metrics.counter("requests", labels={"endpoint": "/dataset", "method": "POST"}) ```

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vespo92/TrueNasCoreMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server