Multi-OS Testing

Validate builds and tests across multiple operating systems in parallel using Agent Harbor’s leader/follower architecture.

Overview

Multi-OS testing enables agents to run tests on Linux, macOS, and Windows simultaneously:

Leader: Primary executor (typically Linux) that owns filesystem snapshots and initiates sync
Followers: Secondary executors that mirror the leader via high-speed file sync
Sync Fence: Ensures all followers have identical file state before test execution
run-everywhere: Execute commands across all platforms and aggregate results

Architecture


┌─────────────────────────────────────────────────────────────┐
│  Agent works on Leader (Linux)                              │
│  - Edits files                                              │
│  - Creates filesystem snapshots                             │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Sync Fence                                                 │
│  - Snapshot captured                                        │
│  - Changes synced to all followers via Mutagen              │
└─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
┌───────────────────┐ ┌───────────────┐ ┌───────────────────┐
│  Linux Follower   │ │ macOS Follower│ │ Windows Follower  │
│  pytest -q        │ │ pytest -q     │ │ pytest -q         │
└───────────────────┘ └───────────────┘ └───────────────────┘
              │               │               │
              └───────────────┼───────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Results aggregated and returned to agent                   │
└─────────────────────────────────────────────────────────────┘

Fleet Configuration

Define your fleet of executors in the configuration:


# .agents/config.toml
 
[fleet]
name = "cross-platform"
 
[[fleet.members]]
name = "linux-leader"
role = "leader"
host = "linux.internal"
ssh_user = "agent"
 
[[fleet.members]]
name = "macos-dev"
role = "follower"
host = "mac.internal"
ssh_user = "agent"
tags = { os = "macos", arch = "arm64" }
 
[[fleet.members]]
name = "windows-build"
role = "follower"
host = "win.internal"
ssh_user = "agent"
tags = { os = "windows", arch = "x64" }
 
[[fleet.members]]
name = "linux-arm"
role = "follower"
host = "arm.internal"
ssh_user = "agent"
tags = { os = "linux", arch = "arm64" }

Running Cross-Platform Tasks

Basic Usage


ah task create --agent claude --fleet cross-platform --prompt "Add feature X and ensure tests pass on all platforms"

Manual Cross-Platform Commands

Run commands on specific platforms from the leader:


# Run on all followers
ah agent followers run -- npm test
 
# Run only on Windows
ah agent followers run --tag os=windows -- npm test
 
# Run only on ARM platforms
ah agent followers run --tag arch=arm64 -- cargo test
 
# Run on a specific host
ah agent followers run --host macos-dev -- swift test

Execution Workflow

1. Agent Edits Files (Leader)

The agent makes changes on the leader workspace.

2. Sync Fence

Before running cross-platform tests, the agent calls:


ah agent followers sync-fence

This:

Creates a filesystem snapshot on the leader
Syncs all changes to followers via Mutagen
Waits until all followers confirm sync completion

3. Run Tests Everywhere


ah agent followers run -- pytest -q

This:

Executes the command on all followers in parallel
Adapts the command for each OS (shell, paths, quoting)
Aggregates exit codes and output
Returns combined results to the agent

4. Fetch Build Artifacts

Collect platform-specific build outputs:


ah agent followers slurp \
  --fileset-data "name=binaries&include=dist/*&dest=./artifacts"

Command Adaptation

Agent Harbor automatically adapts commands for each platform:

Platform	Shell	Path Translation	Example
Linux	bash -lc	Native	`cd /workspaces/proj && pytest`
macOS	zsh -lc	FSKit mount	`cd /Volumes/ah-overlays/proj && pytest`
Windows	PowerShell	Drive mapping	`Set-Location S:\; pytest`

Targeting and Selectors

By Tag


# Run on all Windows hosts
ah agent followers run --tag os=windows -- npm test
 
# Run on ARM64 macOS
ah agent followers run --tag os=macos --tag arch=arm64 -- swift test

By Host


ah agent followers run --host win-12 -- dotnet test

Run Everywhere


ah agent followers run --all -- make test

Test Sharding

Accelerate test execution by distributing tests across followers:


ah agent followers run \
  --command "pytest tests/unit/" \
  --command "pytest tests/integration/" \
  --command "pytest tests/e2e/"

The scheduler:

Builds a job queue from commands
Assigns tasks round-robin across executors
Each command runs on exactly one executor per OS
Results collected after all complete

Fetching Artifacts

Single File Set


ah agent followers slurp \
  --fileset-data "name=binaries&include=dist/*&dest=./artifacts"

Multiple File Sets


ah agent followers slurp \
  --fileset-data "name=windows-exe&include=target/release/*.exe&required=true" \
  --fileset-data "name=mac-app&include=target/release/*.app/**&required=true" \
  --fileset-data "name=linux-bin&include=target/release/myapp&required=true" \
  --dest ./release-artifacts

From Configuration


# .agents/config.toml
[filesets.release]
include = ["target/release/*"]
exclude = ["*.d", "*.rlib"]
required = true


ah agent followers slurp --fileset release --dest ./release

Connectivity

Transport Layers

Control Plane: QUIC connections between executors and access points
Data Plane: SSH tunneled via HTTP CONNECT (or direct if reachable)
Sync: Mutagen over SSH for file synchronization

Connection Flow

Executors register with access point via ah agent enroll
Leader establishes persistent SSH connections to followers
Mutagen project started for file sync
Commands executed over SSH

Firewall Considerations

If followers aren’t directly reachable:

Use HTTP CONNECT via the access point
For hybrid fleets, client relays between access points

Time-Travel Integration

Multi-OS testing integrates with Agent Harbor’s time-travel features:

sync-fence creates a snapshot linked to a SessionMoment
Seeking to that moment restores the leader snapshot
Followers are re-synced by issuing a fence before re-execution

This enables:

Reproducing cross-platform failures
Debugging platform-specific issues from any point in time

Failure Handling

Partial Host Failure

If some hosts fail:

Results aggregated with per-host status
Non-zero exit code if any host fails
Detailed logs available per host

Sync Timeout

If sync fence times out:

Lagging followers reported
Suggestions to narrow selectors
Option to proceed with available hosts

Connectivity Issues

If a follower becomes unreachable:

Health probes detect degradation
Automatic re-route attempts
Host excluded if unrecoverable

Configuration


# .agents/config.toml
 
[fleet]
name = "production"
 
[fleet.sync]
# Patterns to ignore during sync
ignores = ["node_modules", ".venv", "target", "build"]
 
# Sync timeout
timeout = "60s"
 
[fleet.connectivity]
# SSH connection settings
ssh_connect_timeout = "5s"
ssh_keepalive = "30s"
 
# Health check interval
health_probe_interval = "30s"
 
# Consecutive failures before marking degraded
failure_threshold = 3
 
[fleet.execution]
# Maximum concurrent commands per host
concurrency = 1
 
# Command timeout
timeout = "10m"
 
# Retry on transient SSH errors
ssh_retry = true