Long-Horizon Task Execution
Enable agents to work autonomously on complex, multi-day development tasks with automated verification, code review, and supervisor orchestration.
Overview
Long-horizon task execution combines:
- Status Files: Structured task definitions with milestones and verification criteria
- Implementation Agent: Primary coding agent executing milestone work
- Code Review Agents: Specialized reviewers validating quality at milestone boundaries
- Supervisor Agent: Orchestrator that merges changes and decides next steps
- Anti-Halting Detection: Automatic recovery from stuck processes
Status Files
Status files (.status.org or .status.md) define your development plan with clear milestones, deliverables, and verification criteria.
Structure
#+TITLE: Feature Name Status
* Document
:PROPERTIES:
:next_steps: Current next steps
:last_updated: 2026-01-20
:current_milestone: M1
:END:
* Introduction
:PROPERTIES:
:initiative_goals: High-level goals
:related_specs: Links to specs
:END:
Brief context and scope.
* Milestones
** M0: Initial Setup
:PROPERTIES:
:status: completed
:END:
*** Deliverables
- [x] Create project structure
- [x] Add dependencies
*** Verification
- test_name: test_project_builds
file: tests/setup_test.rs
description: Verifies project compiles
status: passing
** M1: Core Implementation
:PROPERTIES:
:status: in_progress
:END:
*** Deliverables
- [ ] Implement main feature
- [ ] Add error handling
*** Verification
- test_name: test_main_feature
description: Verifies feature works
status: pendingKey Properties
| Property | Description |
|---|---|
:next_steps: | Current actionable next steps (updated each session) |
:last_updated: | Date of last update |
:current_milestone: | ID of milestone being worked on |
:status: | planned, in_progress, completed, or blocked |
Running Long-Horizon Tasks
Basic Usage
ah task create --agent claude --prompt "Implement the authentication feature following the status file at ./auth.status.org. Update checkboxes as you complete each item."The agent reads the status file, works on deliverables, and updates progress as it goes.
Execution Loop
The long-horizon loop follows this pattern:
┌─────────────────────────────────────────────────────────────┐
│ STEP 1: Launch Implementation Agent │
│ - Reads status file, implements next milestone │
│ - Updates checkboxes as deliverables complete │
│ - Signals completion by updating milestone status │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ STEP 2: Launch Coordinator Agent │
│ - Examines changes and logs │
│ - Detects abandonment (agent deviated due to blockers) │
│ - Selects appropriate review agents │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ STEP 3: Launch Review Agents │
│ - Each reviewer validates in isolated workspace │
│ - Reviewers can fix issues directly │
│ - Results merged after all reviewers complete │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ STEP 4: Launch Supervisor Agent │
│ - Merges fixes from reviewers │
│ - Runs test suite to verify integrity │
│ - Decides: CONTINUE, NEW_REVIEW, COMPLETE, or STOP │
└─────────────────────────────────────────────────────────────┘Code Review Agents
Specialized review agents validate code at milestone boundaries:
| Agent | Focus | Blocking |
|---|---|---|
| Security | Vulnerabilities, injection, auth bypass | Yes |
| Test Integrity | Test cheating, weakened assertions | Yes |
| Goal Adherence | Milestone requirements match | Yes |
| Architecture | Module boundaries, cycles | Warn |
| Performance | Algorithms, data structures | Conditional |
| Idioms | Project style, patterns | No |
Review Policy
- Fixed Issues: Automatically merged if tests pass
- Unfixable Blocking Issues: Implementation agent must revise
- Warnings: Logged but don’t block milestone advancement
Custom Review Agents
Define project-specific reviewers in .agents/reviewers/:
# .agents/config.toml
[[review_agents]]
name = "migrations"
model = "sonnet"
blocking = true
prompt_file = ".agents/reviewers/migrations.md"
trigger = "path:migrations/**"
[[review_agents]]
name = "api_contracts"
model = "haiku"
blocking = false
prompt_file = ".agents/reviewers/api_contracts.md"
trigger = "always"Anti-Halting Detection
The system automatically detects and recovers from stuck processes:
Detection Types
| Type | Detection | Recovery |
|---|---|---|
| Interactive Prompt | Process waiting for stdin | Background process, provide interaction commands |
| Network Timeout | Blocked on connect/recv | Terminate, suggest retry with timeout |
| Busy Loop | High CPU, identical stacks | Terminate, provide diagnostic info |
| Deadlock | Circular wait chain | Terminate, show thread dump |
Interactive Process Handling
When an interactive prompt is detected, the process is moved to the background:
[AH-ANTI-HALT] Process moved to background - waiting for input
COMMAND: npm init
COMMAND ID: cmd-456
You can interact with it using:
ah agent pty-snapshot cmd-456 # View current state
ah agent send-keys cmd-456 "y" --enter # Send input
ah agent kill cmd-456 # Terminate
RECOMMENDED ALTERNATIVES:
1. Use non-interactive mode: npm init -y
2. Set CI environment: CI=true npm initResearch Integration
When agents encounter blocking technical challenges, a Research Agent can be launched:
- Coordinator detects agent abandoned goals due to blockers
- Research Agent is spawned with web search capabilities
- It creates a forked session from the point of divergence
- Injects targeted help to unstuck the implementation agent
Configuration
# .agents/config.toml
[supervisor]
# Enable long-horizon mode
enabled = true
[supervisor.implementation]
# Default agent for implementation work
default_agent = "claude"
model = "sonnet"
max_session_duration = "2h"
[code_review]
# Run reviews in parallel (requires AgentFS)
parallel = true
# Standard reviewers to include
reviewers = ["security", "test-integrity", "goal-adherence"]
[anti_halting]
enabled = true
[anti_halting.timeouts]
base_timeout = "5m"
max_extended_timeout = "30m"
[anti_halting.interactive_background]
enabled = true
timeout = "5m"Best Practices
Writing Good Status Files
- Clear Deliverables: Each deliverable should be small and testable
- Concrete Verification: Every milestone needs automated verification
- Realistic Milestones: Break complex work into 2-4 hour chunks
- Update Frequently: Agents should update status after each deliverable
Session Workflow
At the end of each session, agents must:
- Update
:next_steps:with what should be done next - Update
:last_updated:to current date - Update verification statuses after running tests
- Add any new outstanding tasks discovered
- Update milestone status if blocked or completed