Skip to main content

Workflow: On-Call Alert Triage

Step-by-step guide for triaging active alerts at the start of an on-call shift — summarize what's firing, filter by severity, understand routing, and identify misconfigurations.


When to Use

Use this workflow when:

  • Starting an on-call shift and need a quick situational overview
  • Investigating which services are affected by active alerts
  • Checking who gets paged for specific alert types
  • Auditing whether alerts are reaching the right receivers

Journey


Step-by-Step

StepActionTool / ResourceKey Parameters
1Discover backendsResource: am://system/backendsConfirms connectivity and health
2Get on-call summaryam_summarize_oncall(backend_id="default")Severity/service breakdown with narrative
3Filter critical alertsam_list_alerts(backend_id="default", severity="critical")Paginated alert list
4Check routingam_explain_routing(backend_id="default", labels={"alertname": "HighCPU", "service": "api", "env": "prod"})Matched receivers and explanation
5Inspect alert groupsam_list_alert_groups(backend_id="default")Alertmanager's native grouping
6Audit default routeam_audit_default_route(backend_id="default")Alerts hitting the fallback receiver

Guided Prompt: Use am-alert-triage-guided for the full step-by-step flow with the AI.


Resources Used

ResourceWhenPurpose
am://system/backendsBefore Step 1Quick overview of all backends
am://alerts/activeAny timeFast snapshot without tool call
am://alerts/groupsAny timeGroup snapshot without tool call

Next Steps