Skip to main content

Workflow: Rule Management & Simulation

Step-by-step guide for creating, validating, testing, and deploying alerting and recording rules — including autonomous Kubernetes CRD patching.


When to Use

Use this workflow when:

  • Drafting new alerting rules from natural language
  • Validating rule syntax before deployment
  • Testing rules with synthetic scenarios or historical data
  • Patching existing PrometheusRule CRDs in Kubernetes

Workflow 1: Draft, Test, Deploy

StepActionToolKey Parameters
1Draft ruleprom_draft_alert_rule(intent="alert when 5xx errors exceed 5%")Returns PromQL + YAML definition
2Check syntaxprom_check_rule_group(rules_yaml="...")Validates YAML and PromQL
3Run unit testsprom_run_rule_tests(rules_yaml="...", test_yaml="...")Runs synthetic test scenarios
4Simulate historicalprom_simulate_firing_historical(expr="...", for_duration="5m")Checks against real historical data
5Apply ruleprom_upsert_rule_group(group_name="api_errors", rules=[...])Creates the rule group

Workflow 2: Autonomous K8s CRD Upsert

For safely patching existing PrometheusRule CRDs without manual kubectl commands.

Journey

Step-by-Step

StepActionTool / Resource
1Inventory loaded groupsResource: prom://rules/groups
2Discover CRD metadataResource: prom://kubernetes/prometheusrules
3Understand target ruleprom_describe_alert_rule(group_name="...", alert_name="...")
4Compose updated ruleprom_draft_alert_rule(intent="...") or manual edit
5Validate syntaxprom_check_rule_group(rules_yaml="...")
6Apply to clusterprom_upsert_rule_group(storage_mode="k8s_crd", namespace="...")
warning

Always cross-reference prom://rules/groups (group names) with prom://kubernetes/prometheusrules (CRD name + namespace) before calling prom_upsert_rule_group. An incorrect namespace will silently create a duplicate CRD instead of patching the existing one.


Natural Language Prompts

"Draft an alert for when the 5xx error rate exceeds 5% for 5 minutes."
"If I set the error rate threshold to 3%, would this alert have fired in the last 7 days?"
"Find the CRD that owns the alertmanager.rules group and update the threshold."
"Analyze the firing history of HighErrorRate — is it too noisy?"

Next Steps