Manages application lifecycle through GitOps, progressive delivery, and edge routing — with ArgoCD, Argo Rollouts, and Traefik sub-agents.
The App Operator coordinates 3 specialized sub-agents, each connecting to its own MCP server via JIT connections. Together, they handle everything from ArgoCD application onboarding through zero-downtime canary rollouts to Traefik traffic management.
Architecture
All three sub-agents follow the same operational pattern:
- READ-ONLY requests → Query Fast-Path (single tool call, no planning)
- STATE-MODIFYING requests → Full Phased Workflow (mandatory HITL approval)
ArgoCD Sub-Agent (argocd-onboarder)
Orchestrates ArgoCD GitOps operations covering Projects, Repositories, and Applications lifecycle.
Read-Only Operations (Fast-Path)
| Query Type | Tool |
|---|
| List applications | list_applications |
| Application details / YAML | get_application_details |
| Application events | get_application_events |
| Application logs | get_application_logs |
| Sync status | get_sync_status |
| Application diff | get_application_diff |
| List repositories | list_repositories |
| Get repository | get_repository |
| List projects | list_projects |
| Get project | get_project |
MCP Resource URIs
| Resource | URI |
|---|
| Applications by cluster | argocd://applications/{cluster} |
| Application metrics | argocd://application-metrics/{cluster}/{app} |
| Sync operations | argocd://sync-operations/{cluster} |
| Deployment events | argocd://deployment-events/{cluster} |
| Cluster health | argocd://cluster-health/{cluster} |
State-Modifying Workflows
| Operation | HITL Phase | Key Actions |
|---|
| Create/Update Application | action_plan_review | Show action, app, project, namespace, source, revision, impact |
| Sync Application | action_plan_review | Run get_application_diff first, show diff in plan |
| Delete Application | deletion_plan_review | Show target type, name, cascade setting, impact |
| Onboard Repository | action_plan_review | Supports HTTPS and SSH, validates connectivity |
Idempotency Rules
| Before creating... | First check with... | If exists... |
|---|
| Application | get_application_details or list_applications | Use update_application instead |
| Project | get_project | Use create_project with updated spec (upserts) |
| Repository | get_repository or list_repositories | Skip — already registered |
Argo Rollouts Sub-Agent (argo-rollouts-onboarder)
Orchestrates progressive delivery — migrating Deployments to Rollouts, executing canary/blue-green strategies, managing Prometheus AnalysisTemplates, and running A/B experiments.
Read-Only Operations (Fast-Path)
| Query Type | Resource URI |
|---|
| List all Rollouts | argorollout://rollouts/list |
| Rollout live status | argorollout://rollouts/{ns}/{name}/detail |
| Health summary (cluster) | argorollout://health/summary |
| Deep health analysis | argorollout://health/{ns}/{name}/details |
| Prometheus metrics | argorollout://metrics/{ns}/{svc}/summary |
| Prometheus connectivity | argorollout://metrics/prometheus/status |
| Rollout revision history | argorollout://history/{ns}/{deployment} |
| Global audit trail | argorollout://history/all |
| Cluster readiness | argorollout://cluster/health |
| Namespace discovery | argorollout://cluster/namespaces |
| Experiment status | argorollout://experiments/{ns}/{name}/status |
State-Modifying Workflows
| Operation | HITL Phase | Key Actions |
|---|
| Migration (Deployment → Rollout) | migration_plan_review | Run validate_deployment_ready first, generate YAML preview with apply=False |
| Image Update (trigger rollout) | deployment_plan_review | Show current → new image, strategy steps, analysis config |
| Lifecycle Actions (promote/abort) | lifecycle_action_review | Show rollout phase, traffic weights, impact |
| Delete | deletion_plan_review | Target type, name, namespace, impact |
| Operation | Correct Tool | Never Use |
|---|
| Update image on existing rollout | argo_update_rollout | argo_manage_legacy_deployment |
| Promote / abort / pause / resume | argo_manage_rollout_lifecycle | argo_manage_legacy_deployment |
| Migrate Deployment → Rollout | convert_deployment_to_rollout | — |
| Post-migration legacy cleanup | argo_manage_legacy_deployment | — |
workloadRef Migration Checklist
After convert_deployment_to_rollout, two follow-up steps are mandatory:
generate_argocd_ignore_differences → User adds to ArgoCD Application CR (prevents false OutOfSync)
argo_manage_legacy_deployment(action='generate_scale_down_manifest') → User commits to Git (prevents duplicate pods)
| Traffic Weight | Behavior |
|---|
| ≤ 50% + healthy AnalysisRun | Promote autonomously, narrate step progression |
| ≥ 50% | Pause, present metrics, require explicit approval |
promote_full | Always requires explicit approval |
| Inconclusive AnalysisRun | Check health + Prometheus. Transient → resume. Persistent → abort |
Traefik Sub-Agent (traefik-edge-router)
Manages Kubernetes edge traffic — weighted canary routing, middleware configuration, traffic mirroring, NGINX-to-Traefik migration, and TCP routing.
Read-Only Operations (Fast-Path)
| Query Type | Resource URI |
|---|
| List all TraefikServices | traefik://traffic/routes/list |
| Route distribution / YAML | traefik://traffic/{ns}/{route}/distribution |
| Service metrics | traefik://metrics/{ns}/{svc}/summary |
| Prometheus connectivity | traefik://metrics/prometheus/status |
| Active anomalies | traefik://anomalies/detected |
| Historical anomalies | traefik://anomalies/history/{ns} |
| NGINX Ingress scan | traefik://migration/nginx-ingress-scan |
| NGINX annotation analysis | traefik://migration/nginx-ingress-analyze |
| Migration progress | traefik://migration/nginx-to-traefik |
State-Modifying Workflows
| Operation | HITL Phase | Key Actions |
|---|
| Weight Shift | traffic_shift_review | Read current distribution first, show current → proposed weights |
| NGINX Migration | migration_plan_review | action=generate first + analyze breaking annotations, show YAML preview |
| Middleware (rate limit, circuit breaker, auth) | middleware_plan_review | Show middleware type, config, attached route |
| Delete/Revert | deletion_plan_review | Target type, name, namespace, impact |
Safety Rules
| Rule | Description |
|---|
| Generate-before-apply | action=generate → show YAML → confirm → action=apply |
| Traffic mirroring | Zero user impact but consumes cluster resources. >50% mirror → warn |
| TCP routing | No weight-based rollback. Confirm service availability. TLS passthrough: check ACME interception |
Key Capabilities
- Weighted Canary Routing — Progressive traffic shifting between service versions
- Middleware Management — Rate limits, circuit breakers, TLS termination, IP allowlists, authentication
- Traffic Mirroring / Shadow Launch — Send copy of live traffic to canary without affecting users
- NGINX-to-Traefik Migration — Automatically translate NGINX Ingress annotations to native Traefik CRDs
- TCP Routing — Direct TCP routing for databases (PostgreSQL, Redis) with TLS passthrough
- Anomaly Detection — Monitor traffic patterns and detect anomalies across namespaces
Skills Reference
| Skill Directory | Sub-Agent | Purpose |
|---|
app-operator/argocd-gitops | argocd-onboarder | ArgoCD workflow patterns, project/repo/app lifecycle |
app-operator/argo-rollouts-gitops | argo-rollouts-onboarder | Migration workflows, canary/blue-green patterns, AnalysisTemplate references |
app-operator/traefik-edge-routing | traefik-edge-router | Routing patterns, middleware CRDs, migration workflows |
MCP Integration
| MCP Server | Package | Transport | Used By |
|---|
argocd_mcp_server | argocd-mcp-server | stdio | argocd-onboarder |
argo_rollout_mcp_server | argo-rollout-mcp-server | stdio | argo-rollouts-onboarder |
traefik_mcp_server | traefik-mcp-server | stdio | traefik-edge-router |
Cross-Domain Integration
The App Operator frequently collaborates with other domains:
| Scenario | Source | Target | Context Passed |
|---|
| "Check pod status after ArgoCD sync" | App Operator | K8s Operator | App name, namespace, sync result |
| "What alerts fired after the rollout?" | App Operator | Observability | Rollout name, namespace, timeline |
| "Deploy the chart I just generated" | Helm Operator | App Operator | Chart path, values, target namespace |
When the App Operator determines a request is outside its scope (e.g., raw Kubernetes resource inspection), it returns a structured handoff signal that the Supervisor routes to the correct domain.
Safety & Governance
HITL Gates
Every state-modifying operation requires explicit user approval via request_human_input. The HumanInTheLoopMiddleware provides a mechanical safety backstop — even if the LLM tries to skip approval, the middleware forces a hard stop before any write operation.
[PLAN-LOCKED] Execution
When the coordinator has already obtained approval, sub-agents receive [PLAN-LOCKED] and execute the pre-approved parameters directly without re-planning.
Rejection Protocol
If the user rejects a plan:
- Sub-agent does not retry with modified parameters
- Returns to coordinator for re-engagement
- Maximum 2 plan presentations per request before asking user to rephrase