Skip to main content

🔄 App Operator

Manages application lifecycle through GitOps, progressive delivery, and edge routing — with ArgoCD, Argo Rollouts, and Traefik sub-agents.

The App Operator coordinates 3 specialized sub-agents, each connecting to its own MCP server via JIT connections. Together, they handle everything from ArgoCD application onboarding through zero-downtime canary rollouts to Traefik traffic management.


Architecture

All three sub-agents follow the same operational pattern:

  • READ-ONLY requests → Query Fast-Path (single tool call, no planning)
  • STATE-MODIFYING requests → Full Phased Workflow (mandatory HITL approval)

ArgoCD Sub-Agent (argocd-onboarder)

Orchestrates ArgoCD GitOps operations covering Projects, Repositories, and Applications lifecycle.

Read-Only Operations (Fast-Path)

Query TypeTool
List applicationslist_applications
Application details / YAMLget_application_details
Application eventsget_application_events
Application logsget_application_logs
Sync statusget_sync_status
Application diffget_application_diff
List repositorieslist_repositories
Get repositoryget_repository
List projectslist_projects
Get projectget_project

MCP Resource URIs

ResourceURI
Applications by clusterargocd://applications/{cluster}
Application metricsargocd://application-metrics/{cluster}/{app}
Sync operationsargocd://sync-operations/{cluster}
Deployment eventsargocd://deployment-events/{cluster}
Cluster healthargocd://cluster-health/{cluster}

State-Modifying Workflows

OperationHITL PhaseKey Actions
Create/Update Applicationaction_plan_reviewShow action, app, project, namespace, source, revision, impact
Sync Applicationaction_plan_reviewRun get_application_diff first, show diff in plan
Delete Applicationdeletion_plan_reviewShow target type, name, cascade setting, impact
Onboard Repositoryaction_plan_reviewSupports HTTPS and SSH, validates connectivity

Idempotency Rules

Before creating...First check with...If exists...
Applicationget_application_details or list_applicationsUse update_application instead
Projectget_projectUse create_project with updated spec (upserts)
Repositoryget_repository or list_repositoriesSkip — already registered

Argo Rollouts Sub-Agent (argo-rollouts-onboarder)

Orchestrates progressive delivery — migrating Deployments to Rollouts, executing canary/blue-green strategies, managing Prometheus AnalysisTemplates, and running A/B experiments.

Read-Only Operations (Fast-Path)

Query TypeResource URI
List all Rolloutsargorollout://rollouts/list
Rollout live statusargorollout://rollouts/{ns}/{name}/detail
Health summary (cluster)argorollout://health/summary
Deep health analysisargorollout://health/{ns}/{name}/details
Prometheus metricsargorollout://metrics/{ns}/{svc}/summary
Prometheus connectivityargorollout://metrics/prometheus/status
Rollout revision historyargorollout://history/{ns}/{deployment}
Global audit trailargorollout://history/all
Cluster readinessargorollout://cluster/health
Namespace discoveryargorollout://cluster/namespaces
Experiment statusargorollout://experiments/{ns}/{name}/status

State-Modifying Workflows

OperationHITL PhaseKey Actions
Migration (Deployment → Rollout)migration_plan_reviewRun validate_deployment_ready first, generate YAML preview with apply=False
Image Update (trigger rollout)deployment_plan_reviewShow current → new image, strategy steps, analysis config
Lifecycle Actions (promote/abort)lifecycle_action_reviewShow rollout phase, traffic weights, impact
Deletedeletion_plan_reviewTarget type, name, namespace, impact

Tool Routing

OperationCorrect ToolNever Use
Update image on existing rolloutargo_update_rolloutargo_manage_legacy_deployment
Promote / abort / pause / resumeargo_manage_rollout_lifecycleargo_manage_legacy_deployment
Migrate Deployment → Rolloutconvert_deployment_to_rollout
Post-migration legacy cleanupargo_manage_legacy_deployment

workloadRef Migration Checklist

After convert_deployment_to_rollout, two follow-up steps are mandatory:

  1. generate_argocd_ignore_differences → User adds to ArgoCD Application CR (prevents false OutOfSync)
  2. argo_manage_legacy_deployment(action='generate_scale_down_manifest') → User commits to Git (prevents duplicate pods)

Autonomous Promotion (Canary)

Traffic WeightBehavior
≤ 50% + healthy AnalysisRunPromote autonomously, narrate step progression
≥ 50%Pause, present metrics, require explicit approval
promote_fullAlways requires explicit approval
Inconclusive AnalysisRunCheck health + Prometheus. Transient → resume. Persistent → abort

Traefik Sub-Agent (traefik-edge-router)

Manages Kubernetes edge traffic — weighted canary routing, middleware configuration, traffic mirroring, NGINX-to-Traefik migration, and TCP routing.

Read-Only Operations (Fast-Path)

Query TypeResource URI
List all TraefikServicestraefik://traffic/routes/list
Route distribution / YAMLtraefik://traffic/{ns}/{route}/distribution
Service metricstraefik://metrics/{ns}/{svc}/summary
Prometheus connectivitytraefik://metrics/prometheus/status
Active anomaliestraefik://anomalies/detected
Historical anomaliestraefik://anomalies/history/{ns}
NGINX Ingress scantraefik://migration/nginx-ingress-scan
NGINX annotation analysistraefik://migration/nginx-ingress-analyze
Migration progresstraefik://migration/nginx-to-traefik

State-Modifying Workflows

OperationHITL PhaseKey Actions
Weight Shifttraffic_shift_reviewRead current distribution first, show current → proposed weights
NGINX Migrationmigration_plan_reviewaction=generate first + analyze breaking annotations, show YAML preview
Middleware (rate limit, circuit breaker, auth)middleware_plan_reviewShow middleware type, config, attached route
Delete/Revertdeletion_plan_reviewTarget type, name, namespace, impact

Safety Rules

RuleDescription
Generate-before-applyaction=generate → show YAML → confirm → action=apply
Traffic mirroringZero user impact but consumes cluster resources. >50% mirror → warn
TCP routingNo weight-based rollback. Confirm service availability. TLS passthrough: check ACME interception

Key Capabilities

  • Weighted Canary Routing — Progressive traffic shifting between service versions
  • Middleware Management — Rate limits, circuit breakers, TLS termination, IP allowlists, authentication
  • Traffic Mirroring / Shadow Launch — Send copy of live traffic to canary without affecting users
  • NGINX-to-Traefik Migration — Automatically translate NGINX Ingress annotations to native Traefik CRDs
  • TCP Routing — Direct TCP routing for databases (PostgreSQL, Redis) with TLS passthrough
  • Anomaly Detection — Monitor traffic patterns and detect anomalies across namespaces

Skills Reference

Skill DirectorySub-AgentPurpose
app-operator/argocd-gitopsargocd-onboarderArgoCD workflow patterns, project/repo/app lifecycle
app-operator/argo-rollouts-gitopsargo-rollouts-onboarderMigration workflows, canary/blue-green patterns, AnalysisTemplate references
app-operator/traefik-edge-routingtraefik-edge-routerRouting patterns, middleware CRDs, migration workflows

MCP Integration

MCP ServerPackageTransportUsed By
argocd_mcp_serverargocd-mcp-serverstdioargocd-onboarder
argo_rollout_mcp_serverargo-rollout-mcp-serverstdioargo-rollouts-onboarder
traefik_mcp_servertraefik-mcp-serverstdiotraefik-edge-router

Cross-Domain Integration

The App Operator frequently collaborates with other domains:

ScenarioSourceTargetContext Passed
"Check pod status after ArgoCD sync"App OperatorK8s OperatorApp name, namespace, sync result
"What alerts fired after the rollout?"App OperatorObservabilityRollout name, namespace, timeline
"Deploy the chart I just generated"Helm OperatorApp OperatorChart path, values, target namespace

When the App Operator determines a request is outside its scope (e.g., raw Kubernetes resource inspection), it returns a structured handoff signal that the Supervisor routes to the correct domain.


Safety & Governance

HITL Gates

Every state-modifying operation requires explicit user approval via request_human_input. The HumanInTheLoopMiddleware provides a mechanical safety backstop — even if the LLM tries to skip approval, the middleware forces a hard stop before any write operation.

[PLAN-LOCKED] Execution

When the coordinator has already obtained approval, sub-agents receive [PLAN-LOCKED] and execute the pre-approved parameters directly without re-planning.

Rejection Protocol

If the user rejects a plan:

  • Sub-agent does not retry with modified parameters
  • Returns to coordinator for re-engagement
  • Maximum 2 plan presentations per request before asking user to rephrase