Meet the Kubernetes Agent

Welcome to k8s-autopilot — a stateful, multi-agent AI system that orchestrates Kubernetes deployments, manages progressive GitOps delivery, and safely debugs your cluster through conversation.

We designed k8s-autopilot to feel less like a rigid script and more like a senior DevOps colleague. Whether you need to generate a complex Helm chart, execute a zero-downtime canary rollout, triage firing alerts at 3 AM, or debug a crashing pod — the agent handles the heavy lifting while keeping you firmly in control through mandatory Human-in-the-Loop approval gates.

Why we built this

Managing Kubernetes at scale is tough. Junior engineers hit a steep learning curve, while senior architects drown in repetitive runbooks, troubleshooting YAML indentation errors, orchestrating canary rollouts, or context-switching between kubectl, Argo dashboards, Helm releases, and Prometheus metrics.

We wanted to fix this by combining the reasoning power of Large Language Models (LLMs) with the strict reliability of tools you already trust — delivered through a conversational interface that actually understands your cluster's context.

With k8s-autopilot, you get:

4 specialized domains covering Helm, ArgoCD/Rollouts/Traefik, Kubernetes ops, and Observability
13 sub-agents each with deep expertise in their respective tools
8 MCP server integrations providing standardized tool access
Human-in-the-Loop safety at every state-modifying operation
Self-healing — if a generation fails validation, the agent catches it, reads the error log, and fixes its own YAML dynamically

How it works under the hood

The architecture uses a Supervisor → Coordinator → Sub-agent hierarchy powered by LangGraph. The Supervisor acts as a pure router, delegating to four domain-specific coordinators that each manage their own team of specialized sub-agents.

When you ask the system to "Deploy the checkout API with zero downtime," the Supervisor routes to the App Operator, which reads your cluster state via MCP, generates a workloadRef migration plan, and waits for your explicit HITL approval before touching a single resource.

Key capabilities at a glance

Domain	What it Does	Key Workflows
📦 Helm Operator	Chart generation, validation, live operations, GitHub persistence	Create chart → Validate → Approve → Commit to GitHub
🔄 App Operator	ArgoCD GitOps, progressive delivery, edge routing	Canary rollouts, blue-green, NGINX→Traefik migration
☸️ K8s Operator	Cluster operations, pod debugging, scaling, RBAC	Root cause analysis, ephemeral debug pods, multi-cluster
🔭 Observability	Prometheus monitoring, Alertmanager alerting	PromQL queries, exporter lifecycle, silence management

Getting Started

The easiest way to take k8s-autopilot for a spin is via Docker Compose.

Quick Start

# Create docker-compose.yml and .env (see Configuration page for details)
docker compose up -d

# k8s-autopilot Agent running at http://localhost:10102
# TalkOps UI running at http://localhost:8080

Open http://localhost:8080 and start talking to the orchestrator.

From Source

git clone https://github.com/talkops-ai/k8s-autopilot.git
cd k8s-autopilot

# Install uv for dependency management
uv venv --python=3.12
source .venv/bin/activate

# Install dependencies
uv pip install -e .

# Create .env and configure API keys
cp .env.example .env

# Start the A2A server
k8s-autopilot --host localhost --port 10102

The agent is model-agnostic — you can use OpenAI, Anthropic, or Google Gemini by setting LLM_PROVIDER in your .env file. You can even route different tiers to different models (e.g., a fast model for the Supervisor and a reasoning model for coordinators).

What's next?

Explore the rest of the documentation:

Components — Deep dive into the Supervisor, coordinators, state management, and middleware architecture
Capabilities — Per-domain breakdown of all 13 sub-agents and their workflows
Configuration — Environment variables, Docker Compose, and LLM model configuration
Examples — Real-world scenarios across all four domains
Troubleshooting — Common issues and debugging guides

Why we built this​

How it works under the hood​

Key capabilities at a glance​

Getting Started​

Quick Start​

From Source​

What's next?​