Architecture & Components

The AWS Orchestrator is built on the Deep Agent pattern — a multi-tier hierarchy of agents, MCP servers, and HITL gates powered by LangGraph. This page covers every architectural component in the system.

1. 🎯 Supervisor Agent

The Supervisor Agent is a pure router. It parses user intent and either delegates to the TF Coordinator or handles non-infrastructure requests (greetings, out-of-scope questions) directly.

Routing Table

Request Type	Tool	Target
Terraform / AWS infrastructure	`transfer_to_terraform`	TF Coordinator (Deep Agent)
Greetings / out-of-scope / clarification	`request_human_input`	User

The Supervisor never generates Terraform code, runs terraform commands, or interacts with GitHub. Its sole job is accurate intent classification and delegation.

2. 🧠 TF Coordinator (Deep Agent)

The TF Coordinator is the brain of the system — a full LangGraph deep agent that orchestrates the entire module lifecycle. It manages the virtual filesystem, skills, memory, HITL gates, and sub-agent delegation.

Two Workflows

The coordinator supports two primary workflows, selected based on user intent:

Coordinator Tools

Tool	Purpose
`sync_workspace`	Materialises virtual `/workspace/` files to real disk before validation
`request_user_input`	Generic HITL gate — pause and ask the user anything (commit approval, next steps, clarification)

State Transforms

The coordinator implements a three-way state bridge between the Supervisor and the deep agent:

Transform	Direction	Purpose
`input_transform`	Supervisor → Deep Agent	Seeds the virtual filesystem with skills and memory files
`build_context`	Supervisor → Runtime Config	Merges env vars + session state + caller context into `TFCoordinatorContext`
`output_transform`	Deep Agent → Supervisor	Extracts final message, status, and synced file paths

3. 🔬 Planner Subgraph (`tf-planner`)

Before any code is generated, the planner runs a 3-phase research pipeline as a compiled LangGraph subgraph. This is what separates AWS Orchestrator from naive code generation — the system researches the service first.

Phase 1: Requirements Analyzer

Extracts infrastructure requirements from the user's request:

Discovers specific AWS services involved
Maps resource attributes and dependencies
If critical attributes are missing (region, environment), triggers HITL clarification

Phase 2: Security & Best Practices

Evaluates the request against security standards:

CIS benchmarks and compliance (SOC 2, HIPAA)
Encryption requirements (SSE with KMS, TLS)
Least-privilege IAM policies
VPC flow logs, access controls
Tagging standards

Phase 3: Execution Planner

Creates a detailed module specification and writes it as skill files:

Determines the file set (main.tf, iam.tf, policies.tf, etc.)
Defines variable schemas, output schemas, and HCL resource patterns
Writes everything to /skills/{service}-module-generator/SKILL.md + references/
The downstream generator follows this blueprint exactly

info

If the planner writes skills successfully (output contains "Skills written for"), the tf-skill-builder is automatically skipped — saving compute and time.

4. ⚙️ Sub-Agents

The system uses 7 sub-agents, each with a narrow scope to prevent context leakage:

Sub-Agent	Purpose	Connection	MCP Server
`tf-planner`	3-phase research pipeline (req analysis → security → execution planning)	Compiled Subgraph	Terraform MCP
`tf-skill-builder`	Generates SKILL.md + references for new AWS services	Static Dict	—
`tf-generator`	Writes `.tf` files following skill blueprint exactly	Static Dict	—
`tf-validator`	Runs `terraform init`, `fmt -check`, `validate` in sandbox	Static Dict	—
`tf-updater`	Fetches existing modules from GitHub, applies surgical edits	JIT MCP	GitHub MCP
`update-planner`	Analyses existing modules for targeted change planning (read-only)	JIT MCP	GitHub MCP
`github-agent`	Commits files to GitHub via MCP tools (never uses shell `git`)	JIT MCP	GitHub MCP

Connection Types

Type	Description	Used By
Compiled Subgraph	Full LangGraph subgraph with its own internal supervisor and 3-phase pipeline	`tf-planner`
Static Dict	Simple dict spec — uses virtual filesystem tools only (read, write, ls, execute)	`tf-skill-builder`, `tf-generator`, `tf-validator`
JIT MCP	`CompiledSubAgent` wrapper that opens GitHub MCP connection lazily, closes after execution	`tf-updater`, `github-agent`, `update-planner`

JIT MCP Pattern

Sub-agents that interact with GitHub use a Just-In-Time connection pattern. Instead of holding the MCP connection open for the entire session, each sub-agent opens its connection only when its node is executed and closes it immediately after:

The github-agent and tf-updater additionally get a FilesystemMiddleware attached (via include_filesystem=True) so they can read generated .tf files from the real disk before committing or editing.

5. 🔌 MCP Integrations

AWS Orchestrator connects to 2 MCP servers that provide real-time external data:

Terraform Registry MCP Server

Aspect	Detail
Package	`terraform-mcp-server` (HashiCorp official)
Used by	`tf-planner` (Requirements Analyzer + Execution Planner)
Purpose	Queries the live Terraform Registry for latest provider schemas, module version constraints, resource arguments, and required inputs
Why it matters	The planner doesn't guess provider configs from training data — it fetches the real, current documentation and writes it into skill blueprints

GitHub Copilot MCP Server

Aspect	Detail
URL	`https://api.githubcopilot.com/mcp/` (configured via `GITHUB_MCP_URL`)
Used by	`github-agent`, `tf-updater`, `update-planner`
Key tools	`create_or_update_file`, `get_file_contents`, `list_directory_contents`
Why it matters	Commits code via API endpoints instead of brittle shell `git` commands. For updates, reads existing module structure directly from GitHub before making surgical edits

6. 🛡️ Middleware Safety Stack

The TF Coordinator runs behind a configurable middleware stack that prevents runaway loops and ensures graceful failure:

Middleware	Purpose	Default Limit	Env Override
`ToolCallLimitMiddleware` (write_file)	Prevents infinite write-retry loops	20 calls	`TF_WRITE_FILE_RUN_LIMIT`
`ToolCallLimitMiddleware` (global)	Caps total tool calls per invocation	60 calls	`TF_GLOBAL_TOOL_RUN_LIMIT`
`ModelCallLimitMiddleware`	Caps excessive LLM calls	40 calls	`TF_MODEL_CALL_RUN_LIMIT`
`ToolRetryMiddleware`	Auto-retries transient tool failures	Disabled	`TF_ENABLE_TOOL_RETRY=true`

The write_file guard is particularly important — if the generator enters a write-fail-retry loop, the middleware forces it to stop and report the error after 20 attempts rather than burning through tokens.

7. 📚 Skills, Memory & Virtual Filesystem

AWS Orchestrator maintains persistence and context awareness using a multi-layered virtual filesystem:

Virtual Filesystem Routes

Virtual Path	Backend	Purpose
`/skills/`	`StateBackend` (LangGraph state)	Per-service skill blueprints created by the planner
`/memories/`	`StoreBackend` (InMemoryStore, org-scoped)	Persistent governance files and operational memory
`/workspace/`	`FilesystemBackend` (real disk via `sync_workspace`)	Generated Terraform module files

Skills (`/skills/`)

Each AWS service gets its own skill directory:

skills/
├── tf-module-generator/         # General generation patterns
├── tf-module-updater/           # Update workflow rules
├── tf-module-validator/         # Validation workflow + error rules
├── tf-skill-builder/            # How to create new skills
├── github-committer/            # Commit workflow via MCP
└── update-planner/              # Module analysis patterns

When the planner runs for a new service (e.g., EKS), it creates a service-specific skill at /skills/eks-module-generator/ containing:

SKILL.md — YAML frontmatter + step-by-step workflow instructions
references/resource-patterns.md — HCL patterns for the service
references/variables-schema.md — Variable definitions
references/outputs-schema.md — Output definitions

tip

If a skill already exists and its provider version is current, the planner is skipped entirely. This makes repeated generations for the same service significantly faster.

Memory (`/memories/`)

The coordinator maintains persistent memory across sessions:

File	Purpose	Update Frequency
`AGENTS.md`	Memory index — what files exist and reading rules	Rarely
`hitl-policies.md`	When to pause and ask the human (mandatory + optional gates)	Updated by agent when new policies are learned
`org-standards.md`	Your org's Terraform conventions (tags, naming, providers)	Admin-managed
`module-index.md`	Where modules live in GitHub repos (for update flows)	After every successful commit
`user-preferences.md`	Per-user or per-team preferences	As discovered from conversations
`failure-log.md`	Validation or deployment failures to avoid repeating	After failures
`learned-patterns.md`	Patterns the agent should reuse across sessions	As discovered

Memory Reading Rules

The coordinator reads memory files in a strict order at session start:

Always read: AGENTS.md (memory index), hitl-policies.md (governance)
Read at start: org-standards.md (org conventions)
Read for updates: module-index.md (only when updating existing modules)
Read on demand: Others as needed

8. 🛠 Tech Stack

Component	Technology	Purpose
Agent Framework	`deepagents` / LangGraph	State machine, orchestration, sub-graph routing
LLM Interface	LangChain Core	Tool execution, message schemas
IaC Platform	Terraform (AWS provider)	Infrastructure as Code
Tools/Integrations	Model Context Protocol (MCP)	Standardized protocol for Terraform Registry + GitHub
User Interface	A2UI / TalkOps A2A	Real-time streaming, HITL approval cards
Validation	Terraform CLI (`init`, `fmt`, `validate`)	Sandbox validation in local environment
Runtime	Python 3.12+	Core agent backend
Infrastructure	Docker / uv / Uvicorn / Starlette	Containerization and package management

LLM Configuration

The agent uses a three-tier LLM configuration — different models for different cognitive jobs:

Tier	Config Key	Default	Used By
Standard	`LLM_MODEL`	`gemini-3.1-flash-lite-preview`	Validator, routing — fast and cheap for yes/no decisions
Higher	`LLM_HIGHER_MODEL`	`gemini-3.1-pro-preview`	Planner, Supervisor — better reasoning for research
Deep Agent	`LLM_DEEPAGENT_MODEL`	`gemini-3.1-pro-preview`	Coordinator, Generator — full capability for code generation

1. 🎯 Supervisor Agent​

Routing Table​

2. 🧠 TF Coordinator (Deep Agent)​

Two Workflows​

Coordinator Tools​

State Transforms​

3. 🔬 Planner Subgraph (tf-planner)​

Phase 1: Requirements Analyzer​

Phase 2: Security & Best Practices​

Phase 3: Execution Planner​

4. ⚙️ Sub-Agents​

Connection Types​

JIT MCP Pattern​

5. 🔌 MCP Integrations​

Terraform Registry MCP Server​

GitHub Copilot MCP Server​

6. 🛡️ Middleware Safety Stack​

7. 📚 Skills, Memory & Virtual Filesystem​

Virtual Filesystem Routes​

Skills (/skills/)​

Memory (/memories/)​

Memory Reading Rules​

8. 🛠 Tech Stack​

LLM Configuration​