Long-Horizon Task Execution

Enable agents to work autonomously on complex, multi-day development tasks with automated verification, code review, and supervisor orchestration.

Overview

Long-horizon task execution combines:

Status Files: Structured task definitions with milestones and verification criteria
Implementation Agent: Primary coding agent executing milestone work
Code Review Agents: Specialized reviewers validating quality at milestone boundaries
Supervisor Agent: Orchestrator that merges changes and decides next steps
Anti-Halting Detection: Automatic recovery from stuck processes

Status Files

Status files (.status.org or .status.md) define your development plan with clear milestones, deliverables, and verification criteria.

Structure


#+TITLE: Feature Name Status

* Document
  :PROPERTIES:
  :next_steps: Current next steps
  :last_updated: 2026-01-20
  :current_milestone: M1
  :END:

* Introduction
  :PROPERTIES:
  :initiative_goals: High-level goals
  :related_specs: Links to specs
  :END:

  Brief context and scope.

* Milestones

** M0: Initial Setup
   :PROPERTIES:
   :status: completed
   :END:

*** Deliverables
    - [x] Create project structure
    - [x] Add dependencies

*** Verification
    - test_name: test_project_builds
      file: tests/setup_test.rs
      description: Verifies project compiles
      status: passing

** M1: Core Implementation
   :PROPERTIES:
   :status: in_progress
   :END:

*** Deliverables
    - [ ] Implement main feature
    - [ ] Add error handling

*** Verification
    - test_name: test_main_feature
      description: Verifies feature works
      status: pending

Key Properties

Property	Description
`:next_steps:`	Current actionable next steps (updated each session)
`:last_updated:`	Date of last update
`:current_milestone:`	ID of milestone being worked on
`:status:`	`planned`, `in_progress`, `completed`, or `blocked`

Running Long-Horizon Tasks

Basic Usage


ah task create --agent claude --prompt "Implement the authentication feature following the status file at ./auth.status.org. Update checkboxes as you complete each item."

The agent reads the status file, works on deliverables, and updates progress as it goes.

Execution Loop

The long-horizon loop follows this pattern:


┌─────────────────────────────────────────────────────────────┐
│  STEP 1: Launch Implementation Agent                        │
│  - Reads status file, implements next milestone             │
│  - Updates checkboxes as deliverables complete              │
│  - Signals completion by updating milestone status          │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  STEP 2: Launch Coordinator Agent                           │
│  - Examines changes and logs                                │
│  - Detects abandonment (agent deviated due to blockers)     │
│  - Selects appropriate review agents                        │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  STEP 3: Launch Review Agents                               │
│  - Each reviewer validates in isolated workspace            │
│  - Reviewers can fix issues directly                        │
│  - Results merged after all reviewers complete              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  STEP 4: Launch Supervisor Agent                            │
│  - Merges fixes from reviewers                              │
│  - Runs test suite to verify integrity                      │
│  - Decides: CONTINUE, NEW_REVIEW, COMPLETE, or STOP         │
└─────────────────────────────────────────────────────────────┘

Code Review Agents

Specialized review agents validate code at milestone boundaries:

Agent	Focus	Blocking
Security	Vulnerabilities, injection, auth bypass	Yes
Test Integrity	Test cheating, weakened assertions	Yes
Goal Adherence	Milestone requirements match	Yes
Architecture	Module boundaries, cycles	Warn
Performance	Algorithms, data structures	Conditional
Idioms	Project style, patterns	No

Review Policy

Fixed Issues: Automatically merged if tests pass
Unfixable Blocking Issues: Implementation agent must revise
Warnings: Logged but don’t block milestone advancement

Custom Review Agents

Define project-specific reviewers in .agents/reviewers/:


# .agents/config.toml
[[review_agents]]
name = "migrations"
model = "sonnet"
blocking = true
prompt_file = ".agents/reviewers/migrations.md"
trigger = "path:migrations/**"
 
[[review_agents]]
name = "api_contracts"
model = "haiku"
blocking = false
prompt_file = ".agents/reviewers/api_contracts.md"
trigger = "always"

Anti-Halting Detection

The system automatically detects and recovers from stuck processes:

Detection Types

Type	Detection	Recovery
Interactive Prompt	Process waiting for stdin	Background process, provide interaction commands
Network Timeout	Blocked on connect/recv	Terminate, suggest retry with timeout
Busy Loop	High CPU, identical stacks	Terminate, provide diagnostic info
Deadlock	Circular wait chain	Terminate, show thread dump

Interactive Process Handling

When an interactive prompt is detected, the process is moved to the background:


[AH-ANTI-HALT] Process moved to background - waiting for input

COMMAND: npm init
COMMAND ID: cmd-456

You can interact with it using:
  ah agent pty-snapshot cmd-456      # View current state
  ah agent send-keys cmd-456 "y" --enter  # Send input
  ah agent kill cmd-456              # Terminate

RECOMMENDED ALTERNATIVES:
  1. Use non-interactive mode: npm init -y
  2. Set CI environment: CI=true npm init

Research Integration

When agents encounter blocking technical challenges, a Research Agent can be launched:

Coordinator detects agent abandoned goals due to blockers
Research Agent is spawned with web search capabilities
It creates a forked session from the point of divergence
Injects targeted help to unstuck the implementation agent

Configuration


# .agents/config.toml
 
[supervisor]
# Enable long-horizon mode
enabled = true
 
[supervisor.implementation]
# Default agent for implementation work
default_agent = "claude"
model = "sonnet"
max_session_duration = "2h"
 
[code_review]
# Run reviews in parallel (requires AgentFS)
parallel = true
 
# Standard reviewers to include
reviewers = ["security", "test-integrity", "goal-adherence"]
 
[anti_halting]
enabled = true
 
[anti_halting.timeouts]
base_timeout = "5m"
max_extended_timeout = "30m"
 
[anti_halting.interactive_background]
enabled = true
timeout = "5m"

Best Practices

Writing Good Status Files

Clear Deliverables: Each deliverable should be small and testable
Concrete Verification: Every milestone needs automated verification
Realistic Milestones: Break complex work into 2-4 hour chunks
Update Frequently: Agents should update status after each deliverable

Session Workflow

At the end of each session, agents must:

Update :next_steps: with what should be done next
Update :last_updated: to current date
Update verification statuses after running tests
Add any new outstanding tasks discovered
Update milestone status if blocked or completed