Workflow: Rule Management & Simulation
Step-by-step guide for creating, validating, testing, and deploying alerting and recording rules — including autonomous Kubernetes CRD patching.
When to Use
Use this workflow when:
- Drafting new alerting rules from natural language
- Validating rule syntax before deployment
- Testing rules with synthetic scenarios or historical data
- Patching existing PrometheusRule CRDs in Kubernetes
Workflow 1: Draft, Test, Deploy
| Step | Action | Tool | Key Parameters |
|---|---|---|---|
| 1 | Draft rule | prom_draft_alert_rule(intent="alert when 5xx errors exceed 5%") | Returns PromQL + YAML definition |
| 2 | Check syntax | prom_check_rule_group(rules_yaml="...") | Validates YAML and PromQL |
| 3 | Run unit tests | prom_run_rule_tests(rules_yaml="...", test_yaml="...") | Runs synthetic test scenarios |
| 4 | Simulate historical | prom_simulate_firing_historical(expr="...", for_duration="5m") | Checks against real historical data |
| 5 | Apply rule | prom_upsert_rule_group(group_name="api_errors", rules=[...]) | Creates the rule group |
Workflow 2: Autonomous K8s CRD Upsert
For safely patching existing PrometheusRule CRDs without manual kubectl commands.
Journey
Step-by-Step
| Step | Action | Tool / Resource |
|---|---|---|
| 1 | Inventory loaded groups | Resource: prom://rules/groups |
| 2 | Discover CRD metadata | Resource: prom://kubernetes/prometheusrules |
| 3 | Understand target rule | prom_describe_alert_rule(group_name="...", alert_name="...") |
| 4 | Compose updated rule | prom_draft_alert_rule(intent="...") or manual edit |
| 5 | Validate syntax | prom_check_rule_group(rules_yaml="...") |
| 6 | Apply to cluster | prom_upsert_rule_group(storage_mode="k8s_crd", namespace="...") |
warning
Always cross-reference prom://rules/groups (group names) with prom://kubernetes/prometheusrules (CRD name + namespace) before calling prom_upsert_rule_group. An incorrect namespace will silently create a duplicate CRD instead of patching the existing one.
Natural Language Prompts
"Draft an alert for when the 5xx error rate exceeds 5% for 5 minutes."
"If I set the error rate threshold to 3%, would this alert have fired in the last 7 days?"
"Find the CRD that owns the alertmanager.rules group and update the threshold."
"Analyze the firing history of HighErrorRate — is it too noisy?"
Next Steps
- Troubleshooting — Diagnosing failed scrape targets
- App Onboarding — Instrumenting applications
- TSDB FinOps — Cardinality analysis and optimization