Skip to main content

Architecture & Components

The AWS Orchestrator is built on the Deep Agent pattern — a multi-tier hierarchy of agents, MCP servers, and HITL gates powered by LangGraph. This page covers every architectural component in the system.


1. 🎯 Supervisor Agent

The Supervisor Agent is a pure router. It parses user intent and either delegates to the TF Coordinator or handles non-infrastructure requests (greetings, out-of-scope questions) directly.

Routing Table

Request TypeToolTarget
Terraform / AWS infrastructuretransfer_to_terraformTF Coordinator (Deep Agent)
Greetings / out-of-scope / clarificationrequest_human_inputUser

The Supervisor never generates Terraform code, runs terraform commands, or interacts with GitHub. Its sole job is accurate intent classification and delegation.


2. 🧠 TF Coordinator (Deep Agent)

The TF Coordinator is the brain of the system — a full LangGraph deep agent that orchestrates the entire module lifecycle. It manages the virtual filesystem, skills, memory, HITL gates, and sub-agent delegation.

Two Workflows

The coordinator supports two primary workflows, selected based on user intent:

Coordinator Tools

ToolPurpose
sync_workspaceMaterialises virtual /workspace/ files to real disk before validation
request_user_inputGeneric HITL gate — pause and ask the user anything (commit approval, next steps, clarification)

State Transforms

The coordinator implements a three-way state bridge between the Supervisor and the deep agent:

TransformDirectionPurpose
input_transformSupervisor → Deep AgentSeeds the virtual filesystem with skills and memory files
build_contextSupervisor → Runtime ConfigMerges env vars + session state + caller context into TFCoordinatorContext
output_transformDeep Agent → SupervisorExtracts final message, status, and synced file paths

3. 🔬 Planner Subgraph (tf-planner)

Before any code is generated, the planner runs a 3-phase research pipeline as a compiled LangGraph subgraph. This is what separates AWS Orchestrator from naive code generation — the system researches the service first.

Phase 1: Requirements Analyzer

Extracts infrastructure requirements from the user's request:

  • Discovers specific AWS services involved
  • Maps resource attributes and dependencies
  • If critical attributes are missing (region, environment), triggers HITL clarification

Phase 2: Security & Best Practices

Evaluates the request against security standards:

  • CIS benchmarks and compliance (SOC 2, HIPAA)
  • Encryption requirements (SSE with KMS, TLS)
  • Least-privilege IAM policies
  • VPC flow logs, access controls
  • Tagging standards

Phase 3: Execution Planner

Creates a detailed module specification and writes it as skill files:

  • Determines the file set (main.tf, iam.tf, policies.tf, etc.)
  • Defines variable schemas, output schemas, and HCL resource patterns
  • Writes everything to /skills/{service}-module-generator/SKILL.md + references/
  • The downstream generator follows this blueprint exactly
info

If the planner writes skills successfully (output contains "Skills written for"), the tf-skill-builder is automatically skipped — saving compute and time.


4. ⚙️ Sub-Agents

The system uses 7 sub-agents, each with a narrow scope to prevent context leakage:

Sub-AgentPurposeConnectionMCP Server
tf-planner3-phase research pipeline (req analysis → security → execution planning)Compiled SubgraphTerraform MCP
tf-skill-builderGenerates SKILL.md + references for new AWS servicesStatic Dict
tf-generatorWrites .tf files following skill blueprint exactlyStatic Dict
tf-validatorRuns terraform init, fmt -check, validate in sandboxStatic Dict
tf-updaterFetches existing modules from GitHub, applies surgical editsJIT MCPGitHub MCP
update-plannerAnalyses existing modules for targeted change planning (read-only)JIT MCPGitHub MCP
github-agentCommits files to GitHub via MCP tools (never uses shell git)JIT MCPGitHub MCP

Connection Types

TypeDescriptionUsed By
Compiled SubgraphFull LangGraph subgraph with its own internal supervisor and 3-phase pipelinetf-planner
Static DictSimple dict spec — uses virtual filesystem tools only (read, write, ls, execute)tf-skill-builder, tf-generator, tf-validator
JIT MCPCompiledSubAgent wrapper that opens GitHub MCP connection lazily, closes after executiontf-updater, github-agent, update-planner

JIT MCP Pattern

Sub-agents that interact with GitHub use a Just-In-Time connection pattern. Instead of holding the MCP connection open for the entire session, each sub-agent opens its connection only when its node is executed and closes it immediately after:

The github-agent and tf-updater additionally get a FilesystemMiddleware attached (via include_filesystem=True) so they can read generated .tf files from the real disk before committing or editing.


5. 🔌 MCP Integrations

AWS Orchestrator connects to 2 MCP servers that provide real-time external data:

Terraform Registry MCP Server

AspectDetail
Packageterraform-mcp-server (HashiCorp official)
Used bytf-planner (Requirements Analyzer + Execution Planner)
PurposeQueries the live Terraform Registry for latest provider schemas, module version constraints, resource arguments, and required inputs
Why it mattersThe planner doesn't guess provider configs from training data — it fetches the real, current documentation and writes it into skill blueprints

GitHub Copilot MCP Server

AspectDetail
URLhttps://api.githubcopilot.com/mcp/ (configured via GITHUB_MCP_URL)
Used bygithub-agent, tf-updater, update-planner
Key toolscreate_or_update_file, get_file_contents, list_directory_contents
Why it mattersCommits code via API endpoints instead of brittle shell git commands. For updates, reads existing module structure directly from GitHub before making surgical edits

6. 🛡️ Middleware Safety Stack

The TF Coordinator runs behind a configurable middleware stack that prevents runaway loops and ensures graceful failure:

MiddlewarePurposeDefault LimitEnv Override
ToolCallLimitMiddleware (write_file)Prevents infinite write-retry loops20 callsTF_WRITE_FILE_RUN_LIMIT
ToolCallLimitMiddleware (global)Caps total tool calls per invocation60 callsTF_GLOBAL_TOOL_RUN_LIMIT
ModelCallLimitMiddlewareCaps excessive LLM calls40 callsTF_MODEL_CALL_RUN_LIMIT
ToolRetryMiddlewareAuto-retries transient tool failuresDisabledTF_ENABLE_TOOL_RETRY=true

The write_file guard is particularly important — if the generator enters a write-fail-retry loop, the middleware forces it to stop and report the error after 20 attempts rather than burning through tokens.


7. 📚 Skills, Memory & Virtual Filesystem

AWS Orchestrator maintains persistence and context awareness using a multi-layered virtual filesystem:

Virtual Filesystem Routes

Virtual PathBackendPurpose
/skills/StateBackend (LangGraph state)Per-service skill blueprints created by the planner
/memories/StoreBackend (InMemoryStore, org-scoped)Persistent governance files and operational memory
/workspace/FilesystemBackend (real disk via sync_workspace)Generated Terraform module files

Skills (/skills/)

Each AWS service gets its own skill directory:

skills/
├── tf-module-generator/ # General generation patterns
├── tf-module-updater/ # Update workflow rules
├── tf-module-validator/ # Validation workflow + error rules
├── tf-skill-builder/ # How to create new skills
├── github-committer/ # Commit workflow via MCP
└── update-planner/ # Module analysis patterns

When the planner runs for a new service (e.g., EKS), it creates a service-specific skill at /skills/eks-module-generator/ containing:

  • SKILL.md — YAML frontmatter + step-by-step workflow instructions
  • references/resource-patterns.md — HCL patterns for the service
  • references/variables-schema.md — Variable definitions
  • references/outputs-schema.md — Output definitions
tip

If a skill already exists and its provider version is current, the planner is skipped entirely. This makes repeated generations for the same service significantly faster.

Memory (/memories/)

The coordinator maintains persistent memory across sessions:

FilePurposeUpdate Frequency
AGENTS.mdMemory index — what files exist and reading rulesRarely
hitl-policies.mdWhen to pause and ask the human (mandatory + optional gates)Updated by agent when new policies are learned
org-standards.mdYour org's Terraform conventions (tags, naming, providers)Admin-managed
module-index.mdWhere modules live in GitHub repos (for update flows)After every successful commit
user-preferences.mdPer-user or per-team preferencesAs discovered from conversations
failure-log.mdValidation or deployment failures to avoid repeatingAfter failures
learned-patterns.mdPatterns the agent should reuse across sessionsAs discovered

Memory Reading Rules

The coordinator reads memory files in a strict order at session start:

  1. Always read: AGENTS.md (memory index), hitl-policies.md (governance)
  2. Read at start: org-standards.md (org conventions)
  3. Read for updates: module-index.md (only when updating existing modules)
  4. Read on demand: Others as needed

8. 🛠 Tech Stack

ComponentTechnologyPurpose
Agent Frameworkdeepagents / LangGraphState machine, orchestration, sub-graph routing
LLM InterfaceLangChain CoreTool execution, message schemas
IaC PlatformTerraform (AWS provider)Infrastructure as Code
Tools/IntegrationsModel Context Protocol (MCP)Standardized protocol for Terraform Registry + GitHub
User InterfaceA2UI / TalkOps A2AReal-time streaming, HITL approval cards
ValidationTerraform CLI (init, fmt, validate)Sandbox validation in local environment
RuntimePython 3.12+Core agent backend
InfrastructureDocker / uv / Uvicorn / StarletteContainerization and package management

LLM Configuration

The agent uses a three-tier LLM configuration — different models for different cognitive jobs:

TierConfig KeyDefaultUsed By
StandardLLM_MODELgemini-3.1-flash-lite-previewValidator, routing — fast and cheap for yes/no decisions
HigherLLM_HIGHER_MODELgemini-3.1-pro-previewPlanner, Supervisor — better reasoning for research
Deep AgentLLM_DEEPAGENT_MODELgemini-3.1-pro-previewCoordinator, Generator — full capability for code generation