A/B Testing with Experiments
Run two versions (e.g., v1 and v2) side by side and compare metrics before promoting one as the winner. Argo Rollouts Experiments create temporary ReplicaSets for each variant. See Onboarding to set up a canary rollout with AnalysisTemplate first.
Prerequisites​
| Component | Status |
|---|---|
| Kubernetes cluster | Accessible via kubectl |
| Argo Rollouts | Installed (kubectl get rollouts -A) |
| Argo Rollout MCP Server | Running (Docker recommended) |
| Onboarded Rollout | Canary strategy, onboarded in target namespace |
| AnalysisTemplate | Exists for the rollout (see Onboarding for setup) |
| Prometheus | Required for analysis-backed experiments |
Experiments create separate ReplicaSets for metric comparison. Weighted traffic routing for experiments depends on your ingress (Traefik, Istio, SMI, ALB). Traefik does not support experiment traffic routing — use experiments for metrics comparison only (experiment-as-analysis pattern).
Environment Setup​
Verify Rollout and AnalysisTemplate​
kubectl get rollout hello-world -n default
kubectl get analysistemplate -n default
Ensure Canary Deployment in Progress​
For experiments that use specRef: "stable" and specRef: "canary", the rollout must have a canary deployment in progress. Trigger a new version first:
"Deploy hello-world:v2 to the hello-world rollout in default."
Then create the experiment while the canary is running.
specRef resolution: When using specRef in templates, the MCP server resolves stable/canary (or active/preview for blue-green) from the Rollout's ReplicaSets. You must pass rollout_name so the server can fetch the correct pod templates.
Start MCP Server (Docker — recommended)​
docker run --rm -it \
-p 8768:8768 \
-v ~/.kube:/app/.kube:ro \
-e K8S_KUBECONFIG=/app/.kube/config \
talkopsai/argo-rollout-mcp-server:latest
Resource for Monitoring​
| Resource | Purpose |
|---|---|
argorollout://experiments/default/hello-world-ab-test/status | Experiment phase, template statuses, analysis results |
Experiment Phases​
| Phase | Meaning |
|---|---|
Pending | Experiment created, pods not yet ready |
Running | Both templates running, analysis in progress |
Successful | All analysis passed, experiment complete |
Failed | Analysis failed or pods unhealthy |
Error | Configuration issue |
Create Experiment Workflow​
Run two versions side-by-side for metric comparison. Templates reference the rollout's stable and canary ReplicaSets.
| Step | Action | Prompt / Tool |
|---|---|---|
| 1 | Trigger canary (if not already) | "Deploy hello-world:v2 to the hello-world rollout in default." |
| 2 | Create experiment | "Create an A/B test experiment called hello-world-ab-test in default — run baseline (stable) and candidate (canary) side by side for 30 minutes." |
| 3 | Or use tool | argo_create_experiment(name="hello-world-ab-test", namespace="default", templates=[{"name": "baseline", "specRef": "stable"}, {"name": "candidate", "specRef": "canary"}], duration="30m", rollout_name="hello-world", analyses=[{"name": "success-rate", "templateName": "hello-world-analysis"}]) |
Notes:
rollout_nameis required when templates usespecRef— the server resolves stable/canary from the Rollout's ReplicaSets.- Replace
hello-world-analysiswith your actual AnalysisTemplate name from onboarding.
Monitor Experiment Workflow​
| Step | Action | Prompt / Tool |
|---|---|---|
| 1 | Poll status | "What is the status of the hello-world-ab-test experiment in default?" |
| 2 | Or fetch resource | argorollout://experiments/default/hello-world-ab-test/status |
| 3 | Review | Check template_statuses and analysis_runs in response |
Promote Candidate Workflow (If Candidate Wins)​
| Step | Action | Prompt / Tool |
|---|---|---|
| 1 | Promote candidate | "Deploy hello-world:v2 to the hello-world rollout in default." (if not already) + "Fully promote the hello-world rollout in default." |
| 2 | Or use tool | argo_update_rollout(update_type='image', name="hello-world", new_image="hello-world:v2", namespace="default") then argo_manage_rollout_lifecycle(action='promote_full', ...) |
| 3 | Delete experiment | "Delete the hello-world-ab-test experiment in default." |
Clean Up Workflow (If Baseline Wins)​
| Step | Action | Prompt / Tool |
|---|---|---|
| 1 | Abort rollout | "Abort the hello-world rollout in default." |
| 2 | Delete experiment | "Delete the hello-world-ab-test experiment in default." |
| 3 | Or use tool | argo_delete_experiment(name="hello-world-ab-test", namespace="default") |
Inconclusive — Extend Duration or Manual Review​
| Step | Action | Prompt / Tool |
|---|---|---|
| 1 | Review analysis | Fetch argorollout://experiments/default/hello-world-ab-test/status |
| 2 | Manual decision | Extend experiment duration (re-create with longer duration) or manually promote/abort based on metrics |
Natural Language Prompts​
Copy-paste these prompts into your MCP client (Cursor, Claude, etc.).
Create Experiment​
"Create an A/B test experiment called hello-world-ab-test in default — run baseline (stable) and candidate (canary) side by side for 30 minutes. Use rollout hello-world for specRef resolution."
"Start an Argo Experiment named hello-world-ab-test in default with two templates: baseline (stable spec) and candidate (canary spec), running for 1 hour. Resolve specRef from rollout hello-world."
When using specRef, the MCP client must pass rollout_name="hello-world" to argo_create_experiment so the server can resolve stable/canary from the Rollout's ReplicaSets.
Monitor​
"What is the status of the hello-world-ab-test experiment in default?"
"Show me the analysis results for the hello-world-ab-test experiment in default."
Clean Up​
"Delete the hello-world-ab-test experiment in default — the baseline won, keeping stable."
"Clean up the hello-world-ab-test experiment in default."
Quick Reference: Tool → Prompt Mapping​
| Tool | Example Prompt |
|---|---|
argo_create_experiment | "Create an A/B test experiment called hello-world-ab-test in default — run baseline and candidate side by side for 30 minutes." |
argo_delete_experiment | "Delete the hello-world-ab-test experiment in default." |
argorollout://experiments/default/hello-world-ab-test/status | "What is the status of the hello-world-ab-test experiment in default?" |
argo_update_rollout (image) | "Deploy hello-world:v2 to the hello-world rollout in default." (promote candidate) |
argo_manage_rollout_lifecycle (abort) | "Abort the hello-world rollout in default." (keep baseline) |
Troubleshooting​
| Issue | Check |
|---|---|
| Experiment stuck in Pending | Verify rollout has canary in progress. Check pod status: kubectl get pods -n default -l app=hello-world |
| AnalysisTemplate not found | Ensure AnalysisTemplate exists and templateName in analyses matches. See Onboarding. |
| specRef "stable"/"canary" invalid | Pass rollout_name when using specRef (required). Ensure the Rollout has a canary deployment in progress — trigger argo_update_rollout(update_type='image') first. |
| Traefik traffic routing | Traefik does not support experiment traffic routing. Use experiments for metrics comparison only; configure ingress separately for user-facing A/B tests. |
| Experiment Failed | Check AnalysisRun status: kubectl get analysisruns -n default. Verify Prometheus queries return data. |
Next Steps​
- Onboarding — Set up canary rollout with AnalysisTemplate
- Deployment Strategies — Promote winner to production
- Emergency Abort — Rollback when baseline wins
- Tools —
argo_create_experiment,argo_delete_experiment - Examples — Quick reference and prompts