Workflow: Troubleshooting Failed Targets
Step-by-step guide for diagnosing why a scrape target is down — using resources and tools to triage the issue systematically.
When to Use
Use this workflow when:
- A scrape target is showing as
downin Prometheus - You see
up{job="..."} == 0for a job - The AI needs to diagnose a monitoring gap
Step-by-Step
| Step | Action | Tool / Resource | Key Parameters |
|---|---|---|---|
| 1 | Check failed targets | Resource: prom://topology/failed_targets | Aggregated view of all failed targets |
| 2 | Check up status | prom_query_instant(query="up{job='api-server'}") | Shows target health |
| 3 | Check scrape duration | prom_query_instant(query="scrape_duration_seconds{job='api-server'}") | Detect slow endpoints |
| 4 | Validate endpoint directly | prom_test_endpoint(endpoint_url="http://api-server.default:8080/metrics") | Bypasses Prometheus — direct HTTP check |
| 5 | Check cardinality | Resource: prom://tsdb/cardinality | High cardinality can cause performance issues |
Guided Prompt: Use
prom-troubleshoot-guidedfor the full step-by-step flow.
Common Scenarios
| Scenario | Cause | Fix |
|---|---|---|
| Connection Refused | Pod/VM not running or wrong port | Verify Deployment is healthy, check port in ServiceMonitor |
| Context Deadline Exceeded | Scrape timeout exceeded | Increase scrape_timeout or optimize the metrics endpoint |
| 401 Unauthorized | Endpoint requires authentication | Configure bearer token or basic auth in ServiceMonitor |
| High Cardinality | Too many label dimensions | Use prom_plan_relabel to drop labels |
| No metrics found | Endpoint doesn't expose Prometheus format | Use prom_test_endpoint to validate format |
Resources for Troubleshooting
| Resource | Purpose |
|---|---|
prom://topology/failed_targets | Quick triage of all down targets |
prom://topology/services | Service catalog with health status |
prom://system/backends | Backend connectivity check |
prom://config/runtime | Verify scrape intervals and retention |
Next Steps
- App Onboarding — Instrumenting applications
- PromQL Querying — Safe query workflows
- TSDB FinOps — Cardinality analysis and optimization