Skip to main content

AWS Orchestrator Troubleshooting

Common architectural issues and solutions when running the AWS Orchestrator Agent.


1. tf-validator fails with "Directory not found at workspace/terraform_modules/..."​

Symptom: Validation agent crashes immediately after the sync_workspace_to_disk utility fires. Cause: There is an absolute pathing discrepancy between the internal Docker runtime and your local host mount configurations. The validator operates on physical disk boundaries, whereas the generators use the virtual FS (/workspace/). Fix: Guarantee that your docker-compose.yml explicitly mounts the path -v ./workspace:/app/workspace and that TERRAFORM_WORKSPACE=./workspace/terraform_modules is set securely in your .env.


2. Generator retries endlessly and fails​

Symptom: The console outputs a continuous loop of tf-validator errors followed by tf-generator rewrites until an ultimate timeout occurs. Cause: The internal evaluation loop requires the selected LLM_DEEPAGENT_MODEL to possess a high cognitive tracking capacity to parse complex stderr blobs during validation feedback routing. Flash or Lite models struggle to resolve intersecting dependency errors. Fix: You are likely using a "flash" or "lite" model for the deep agent tier. Switch LLM_DEEPAGENT_MODEL inside .env to a premium capacity inference model (e.g., gemini-3.1-pro-preview).


3. "I'm sorry, I cannot fulfill out-of-scope requests."​

Symptom: Immediate failure to execute without any sub-agents spinning up. Cause: The SUPERVISOR_PROMPT triggered a request_human_input intercept because it completely failed to parse AWS or Terraform intent from your prompt logic. Fix: Be highly explicit. Prefix your prompt with actionable intents such as "Generate Terraform code for..." or "Update the AWS module..." Use cloud-specific verbiage.


4. GitHub Agent Commit Authorization Failure (401 or Not Found)​

Symptom: The JIT (Just-In-Time) MCP subagent successfully builds the GitHub payload output, but GitHub rejects the ultimate push operation. Cause: The subagent tried to attach the FilesystemMiddleware against a git interaction, but your PAT is either scoped inadequately or the target repository does not exist dynamically on GitHub prior to execution. Fix:

  1. Verify GITHUB_PERSONAL_ACCESS_TOKEN is mapped explicitly in .env.
  2. Guarantee it possesses explicit read/write repo scope inside your GitHub org/user developer settings.
  3. Manually ensure the target repository container exists before initiating an agent PR generation run.

5. UI Streaming Breaks Mid-Generation​

Symptom: The A2UI frontend interface inside TalkOps UI freezes output despite backend docker logs showing the agent is still running. Cause: A transient asynchronous mapping disconnect occurred in the A2A schema stream during a hefty subagent tool_call transition. Fix: Do not restart the container or the backend. Instead, natively refresh your browser UI page. The robust AgentResponse streaming architecture will seamlessly resume appending logs at the exact position the LangGraph checkpointer saved the state.


Accessing Logs for Bug Reporting​

If you continue to face framework issues not covered above, fetch the runtime backend logs via Docker to append to your bug reports.

docker logs aws-orchestrator --tail 200