Architecture & Components
The AWS Orchestrator is built on the Deep Agent pattern — a multi-tier hierarchy of agents, MCP servers, and HITL gates powered by LangGraph. This page covers every architectural component in the system.
1. 🎯 Supervisor Agent
The Supervisor Agent is a pure router. It parses user intent and either delegates to the TF Coordinator or handles non-infrastructure requests (greetings, out-of-scope questions) directly.
Routing Table
| Request Type | Tool | Target |
|---|---|---|
| Terraform / AWS infrastructure | transfer_to_terraform | TF Coordinator (Deep Agent) |
| Greetings / out-of-scope / clarification | request_human_input | User |
The Supervisor never generates Terraform code, runs terraform commands, or interacts with GitHub. Its sole job is accurate intent classification and delegation.
2. 🧠 TF Coordinator (Deep Agent)
The TF Coordinator is the brain of the system — a full LangGraph deep agent that orchestrates the entire module lifecycle. It manages the virtual filesystem, skills, memory, HITL gates, and sub-agent delegation.
Two Workflows
The coordinator supports two primary workflows, selected based on user intent:
Coordinator Tools
| Tool | Purpose |
|---|---|
sync_workspace | Materialises virtual /workspace/ files to real disk before validation |
request_user_input | Generic HITL gate — pause and ask the user anything (commit approval, next steps, clarification) |
State Transforms
The coordinator implements a three-way state bridge between the Supervisor and the deep agent:
| Transform | Direction | Purpose |
|---|---|---|
input_transform | Supervisor → Deep Agent | Seeds the virtual filesystem with skills and memory files |
build_context | Supervisor → Runtime Config | Merges env vars + session state + caller context into TFCoordinatorContext |
output_transform | Deep Agent → Supervisor | Extracts final message, status, and synced file paths |
3. 🔬 Planner Subgraph (tf-planner)
Before any code is generated, the planner runs a 3-phase research pipeline as a compiled LangGraph subgraph. This is what separates AWS Orchestrator from naive code generation — the system researches the service first.
Phase 1: Requirements Analyzer
Extracts infrastructure requirements from the user's request:
- Discovers specific AWS services involved
- Maps resource attributes and dependencies
- If critical attributes are missing (region, environment), triggers HITL clarification
Phase 2: Security & Best Practices
Evaluates the request against security standards:
- CIS benchmarks and compliance (SOC 2, HIPAA)
- Encryption requirements (SSE with KMS, TLS)
- Least-privilege IAM policies
- VPC flow logs, access controls
- Tagging standards
Phase 3: Execution Planner
Creates a detailed module specification and writes it as skill files:
- Determines the file set (
main.tf,iam.tf,policies.tf, etc.) - Defines variable schemas, output schemas, and HCL resource patterns
- Writes everything to
/skills/{service}-module-generator/SKILL.md+references/ - The downstream generator follows this blueprint exactly
If the planner writes skills successfully (output contains "Skills written for"), the tf-skill-builder is automatically skipped — saving compute and time.
4. ⚙️ Sub-Agents
The system uses 7 sub-agents, each with a narrow scope to prevent context leakage:
| Sub-Agent | Purpose | Connection | MCP Server |
|---|---|---|---|
tf-planner | 3-phase research pipeline (req analysis → security → execution planning) | Compiled Subgraph | Terraform MCP |
tf-skill-builder | Generates SKILL.md + references for new AWS services | Static Dict | — |
tf-generator | Writes .tf files following skill blueprint exactly | Static Dict | — |
tf-validator | Runs terraform init, fmt -check, validate in sandbox | Static Dict | — |
tf-updater | Fetches existing modules from GitHub, applies surgical edits | JIT MCP | GitHub MCP |
update-planner | Analyses existing modules for targeted change planning (read-only) | JIT MCP | GitHub MCP |
github-agent | Commits files to GitHub via MCP tools (never uses shell git) | JIT MCP | GitHub MCP |
Connection Types
| Type | Description | Used By |
|---|---|---|
| Compiled Subgraph | Full LangGraph subgraph with its own internal supervisor and 3-phase pipeline | tf-planner |
| Static Dict | Simple dict spec — uses virtual filesystem tools only (read, write, ls, execute) | tf-skill-builder, tf-generator, tf-validator |
| JIT MCP | CompiledSubAgent wrapper that opens GitHub MCP connection lazily, closes after execution | tf-updater, github-agent, update-planner |
JIT MCP Pattern
Sub-agents that interact with GitHub use a Just-In-Time connection pattern. Instead of holding the MCP connection open for the entire session, each sub-agent opens its connection only when its node is executed and closes it immediately after:
The github-agent and tf-updater additionally get a FilesystemMiddleware attached (via include_filesystem=True) so they can read generated .tf files from the real disk before committing or editing.
5. 🔌 MCP Integrations
AWS Orchestrator connects to 2 MCP servers that provide real-time external data:
Terraform Registry MCP Server
| Aspect | Detail |
|---|---|
| Package | terraform-mcp-server (HashiCorp official) |
| Used by | tf-planner (Requirements Analyzer + Execution Planner) |
| Purpose | Queries the live Terraform Registry for latest provider schemas, module version constraints, resource arguments, and required inputs |
| Why it matters | The planner doesn't guess provider configs from training data — it fetches the real, current documentation and writes it into skill blueprints |
GitHub Copilot MCP Server
| Aspect | Detail |
|---|---|
| URL | https://api.githubcopilot.com/mcp/ (configured via GITHUB_MCP_URL) |
| Used by | github-agent, tf-updater, update-planner |
| Key tools | create_or_update_file, get_file_contents, list_directory_contents |
| Why it matters | Commits code via API endpoints instead of brittle shell git commands. For updates, reads existing module structure directly from GitHub before making surgical edits |
6. 🛡️ Middleware Safety Stack
The TF Coordinator runs behind a configurable middleware stack that prevents runaway loops and ensures graceful failure:
| Middleware | Purpose | Default Limit | Env Override |
|---|---|---|---|
ToolCallLimitMiddleware (write_file) | Prevents infinite write-retry loops | 20 calls | TF_WRITE_FILE_RUN_LIMIT |
ToolCallLimitMiddleware (global) | Caps total tool calls per invocation | 60 calls | TF_GLOBAL_TOOL_RUN_LIMIT |
ModelCallLimitMiddleware | Caps excessive LLM calls | 40 calls | TF_MODEL_CALL_RUN_LIMIT |
ToolRetryMiddleware | Auto-retries transient tool failures | Disabled | TF_ENABLE_TOOL_RETRY=true |
The write_file guard is particularly important — if the generator enters a write-fail-retry loop, the middleware forces it to stop and report the error after 20 attempts rather than burning through tokens.
7. 📚 Skills, Memory & Virtual Filesystem
AWS Orchestrator maintains persistence and context awareness using a multi-layered virtual filesystem:
Virtual Filesystem Routes
| Virtual Path | Backend | Purpose |
|---|---|---|
/skills/ | StateBackend (LangGraph state) | Per-service skill blueprints created by the planner |
/memories/ | StoreBackend (InMemoryStore, org-scoped) | Persistent governance files and operational memory |
/workspace/ | FilesystemBackend (real disk via sync_workspace) | Generated Terraform module files |
Skills (/skills/)
Each AWS service gets its own skill directory:
skills/
├── tf-module-generator/ # General generation patterns
├── tf-module-updater/ # Update workflow rules
├── tf-module-validator/ # Validation workflow + error rules
├── tf-skill-builder/ # How to create new skills
├── github-committer/ # Commit workflow via MCP
└── update-planner/ # Module analysis patterns
When the planner runs for a new service (e.g., EKS), it creates a service-specific skill at /skills/eks-module-generator/ containing:
SKILL.md— YAML frontmatter + step-by-step workflow instructionsreferences/resource-patterns.md— HCL patterns for the servicereferences/variables-schema.md— Variable definitionsreferences/outputs-schema.md— Output definitions
If a skill already exists and its provider version is current, the planner is skipped entirely. This makes repeated generations for the same service significantly faster.
Memory (/memories/)
The coordinator maintains persistent memory across sessions:
| File | Purpose | Update Frequency |
|---|---|---|
AGENTS.md | Memory index — what files exist and reading rules | Rarely |
hitl-policies.md | When to pause and ask the human (mandatory + optional gates) | Updated by agent when new policies are learned |
org-standards.md | Your org's Terraform conventions (tags, naming, providers) | Admin-managed |
module-index.md | Where modules live in GitHub repos (for update flows) | After every successful commit |
user-preferences.md | Per-user or per-team preferences | As discovered from conversations |
failure-log.md | Validation or deployment failures to avoid repeating | After failures |
learned-patterns.md | Patterns the agent should reuse across sessions | As discovered |
Memory Reading Rules
The coordinator reads memory files in a strict order at session start:
- Always read:
AGENTS.md(memory index),hitl-policies.md(governance) - Read at start:
org-standards.md(org conventions) - Read for updates:
module-index.md(only when updating existing modules) - Read on demand: Others as needed
8. 🛠 Tech Stack
| Component | Technology | Purpose |
|---|---|---|
| Agent Framework | deepagents / LangGraph | State machine, orchestration, sub-graph routing |
| LLM Interface | LangChain Core | Tool execution, message schemas |
| IaC Platform | Terraform (AWS provider) | Infrastructure as Code |
| Tools/Integrations | Model Context Protocol (MCP) | Standardized protocol for Terraform Registry + GitHub |
| User Interface | A2UI / TalkOps A2A | Real-time streaming, HITL approval cards |
| Validation | Terraform CLI (init, fmt, validate) | Sandbox validation in local environment |
| Runtime | Python 3.12+ | Core agent backend |
| Infrastructure | Docker / uv / Uvicorn / Starlette | Containerization and package management |
LLM Configuration
The agent uses a three-tier LLM configuration — different models for different cognitive jobs:
| Tier | Config Key | Default | Used By |
|---|---|---|---|
| Standard | LLM_MODEL | gemini-3.1-flash-lite-preview | Validator, routing — fast and cheap for yes/no decisions |
| Higher | LLM_HIGHER_MODEL | gemini-3.1-pro-preview | Planner, Supervisor — better reasoning for research |
| Deep Agent | LLM_DEEPAGENT_MODEL | gemini-3.1-pro-preview | Coordinator, Generator — full capability for code generation |