Skip to main content

Workflow: PromQL Querying

Step-by-step guide for safely querying Prometheus metrics — with counter enforcement, auto-downsampling, and query validation to prevent unsafe or unbounded queries.


When to Use

Use this workflow when:

  • Querying Prometheus metrics from an AI assistant
  • Exploring available metrics and their labels
  • Running both instant and range queries
  • Calculating latency from histogram metrics

Step-by-Step

StepActionTool / ResourceKey Parameters
1Discover service metricsResource: prom://topology/services/{job}/metricsReturns all metrics emitted by a service
2Explore metric labelsprom_explore_labels(metric_name="<metric>")Returns label keys and their top values
3Validate query syntaxprom_validate_promql(query="rate(...)")Returns {valid: true/false, error: ...}
4Run instant queryprom_query_instant(query="rate(...)")Point-in-time vector result
5Run range queryprom_query_range(query="rate(...)", start=<unix>, end=<unix>)Auto-computes step, downsamples to ~200 pts

Guided Prompt: Use prom-query-guided for the full step-by-step flow.


Safety Guardrails

GuardrailDescriptionOverride
Counter EnforcementCounters must use rate() or increase()allow_raw_counters=true
Auto-DownsamplingRange queries capped at ~200 points/seriesmax_points_per_series param
Query ValidationSyntax checked before executionUse action=validate first
TimeoutDefault 30s query timeouttimeout param

Auto-Step Computation

When step is omitted from range queries, the server auto-computes:

step = (end - start) / max_points_per_series

This ensures LLM context windows are protected from massive data payloads.


Natural Language Prompts

"What is the current request rate for http_requests_total?"
"Show me CPU usage across all pods in production over the last hour."
"Calculate p99 latency from the http_request_duration_seconds histogram."
"Show me the error rate trend for the checkout service — last 24 hours."

Next Steps