Workflow: On-Call Alert Triage
Step-by-step guide for triaging active alerts at the start of an on-call shift — summarize what's firing, filter by severity, understand routing, and identify misconfigurations.
When to Use
Use this workflow when:
- Starting an on-call shift and need a quick situational overview
- Investigating which services are affected by active alerts
- Checking who gets paged for specific alert types
- Auditing whether alerts are reaching the right receivers
Journey
Step-by-Step
| Step | Action | Tool / Resource | Key Parameters |
|---|---|---|---|
| 1 | Discover backends | Resource: am://system/backends | Confirms connectivity and health |
| 2 | Get on-call summary | am_summarize_oncall(backend_id="default") | Severity/service breakdown with narrative |
| 3 | Filter critical alerts | am_list_alerts(backend_id="default", severity="critical") | Paginated alert list |
| 4 | Check routing | am_explain_routing(backend_id="default", labels={"alertname": "HighCPU", "service": "api", "env": "prod"}) | Matched receivers and explanation |
| 5 | Inspect alert groups | am_list_alert_groups(backend_id="default") | Alertmanager's native grouping |
| 6 | Audit default route | am_audit_default_route(backend_id="default") | Alerts hitting the fallback receiver |
Guided Prompt: Use
am-alert-triage-guidedfor the full step-by-step flow with the AI.
Resources Used
| Resource | When | Purpose |
|---|---|---|
am://system/backends | Before Step 1 | Quick overview of all backends |
am://alerts/active | Any time | Fast snapshot without tool call |
am://alerts/groups | Any time | Group snapshot without tool call |
Next Steps
- Maintenance Silence — Safe silence lifecycle for planned maintenance
- Routing Audit — Inspect and simulate routing configuration
- Integration Testing — Verify notification integrations