Multi-OS Testing
Validate builds and tests across multiple operating systems in parallel using Agent Harbor’s leader/follower architecture.
Overview
Multi-OS testing enables agents to run tests on Linux, macOS, and Windows simultaneously:
- Leader: Primary executor (typically Linux) that owns filesystem snapshots and initiates sync
- Followers: Secondary executors that mirror the leader via high-speed file sync
- Sync Fence: Ensures all followers have identical file state before test execution
- run-everywhere: Execute commands across all platforms and aggregate results
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Agent works on Leader (Linux) │
│ - Edits files │
│ - Creates filesystem snapshots │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Sync Fence │
│ - Snapshot captured │
│ - Changes synced to all followers via Mutagen │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌───────────────┐ ┌───────────────────┐
│ Linux Follower │ │ macOS Follower│ │ Windows Follower │
│ pytest -q │ │ pytest -q │ │ pytest -q │
└───────────────────┘ └───────────────┘ └───────────────────┘
│ │ │
└───────────────┼───────────────┘
▼
┌─────────────────────────────────────────────────────────────┐
│ Results aggregated and returned to agent │
└─────────────────────────────────────────────────────────────┘Fleet Configuration
Define your fleet of executors in the configuration:
# .agents/config.toml
[fleet]
name = "cross-platform"
[[fleet.members]]
name = "linux-leader"
role = "leader"
host = "linux.internal"
ssh_user = "agent"
[[fleet.members]]
name = "macos-dev"
role = "follower"
host = "mac.internal"
ssh_user = "agent"
tags = { os = "macos", arch = "arm64" }
[[fleet.members]]
name = "windows-build"
role = "follower"
host = "win.internal"
ssh_user = "agent"
tags = { os = "windows", arch = "x64" }
[[fleet.members]]
name = "linux-arm"
role = "follower"
host = "arm.internal"
ssh_user = "agent"
tags = { os = "linux", arch = "arm64" }Running Cross-Platform Tasks
Basic Usage
ah task create --agent claude --fleet cross-platform --prompt "Add feature X and ensure tests pass on all platforms"Manual Cross-Platform Commands
Run commands on specific platforms from the leader:
# Run on all followers
ah agent followers run -- npm test
# Run only on Windows
ah agent followers run --tag os=windows -- npm test
# Run only on ARM platforms
ah agent followers run --tag arch=arm64 -- cargo test
# Run on a specific host
ah agent followers run --host macos-dev -- swift testExecution Workflow
1. Agent Edits Files (Leader)
The agent makes changes on the leader workspace.
2. Sync Fence
Before running cross-platform tests, the agent calls:
ah agent followers sync-fenceThis:
- Creates a filesystem snapshot on the leader
- Syncs all changes to followers via Mutagen
- Waits until all followers confirm sync completion
3. Run Tests Everywhere
ah agent followers run -- pytest -qThis:
- Executes the command on all followers in parallel
- Adapts the command for each OS (shell, paths, quoting)
- Aggregates exit codes and output
- Returns combined results to the agent
4. Fetch Build Artifacts
Collect platform-specific build outputs:
ah agent followers slurp \
--fileset-data "name=binaries&include=dist/*&dest=./artifacts"Command Adaptation
Agent Harbor automatically adapts commands for each platform:
| Platform | Shell | Path Translation | Example |
|---|---|---|---|
| Linux | bash -lc | Native | cd /workspaces/proj && pytest |
| macOS | zsh -lc | FSKit mount | cd /Volumes/ah-overlays/proj && pytest |
| Windows | PowerShell | Drive mapping | Set-Location S:\; pytest |
Targeting and Selectors
By Tag
# Run on all Windows hosts
ah agent followers run --tag os=windows -- npm test
# Run on ARM64 macOS
ah agent followers run --tag os=macos --tag arch=arm64 -- swift testBy Host
ah agent followers run --host win-12 -- dotnet testRun Everywhere
ah agent followers run --all -- make testTest Sharding
Accelerate test execution by distributing tests across followers:
ah agent followers run \
--command "pytest tests/unit/" \
--command "pytest tests/integration/" \
--command "pytest tests/e2e/"The scheduler:
- Builds a job queue from commands
- Assigns tasks round-robin across executors
- Each command runs on exactly one executor per OS
- Results collected after all complete
Fetching Artifacts
Single File Set
ah agent followers slurp \
--fileset-data "name=binaries&include=dist/*&dest=./artifacts"Multiple File Sets
ah agent followers slurp \
--fileset-data "name=windows-exe&include=target/release/*.exe&required=true" \
--fileset-data "name=mac-app&include=target/release/*.app/**&required=true" \
--fileset-data "name=linux-bin&include=target/release/myapp&required=true" \
--dest ./release-artifactsFrom Configuration
# .agents/config.toml
[filesets.release]
include = ["target/release/*"]
exclude = ["*.d", "*.rlib"]
required = trueah agent followers slurp --fileset release --dest ./releaseConnectivity
Transport Layers
- Control Plane: QUIC connections between executors and access points
- Data Plane: SSH tunneled via HTTP CONNECT (or direct if reachable)
- Sync: Mutagen over SSH for file synchronization
Connection Flow
- Executors register with access point via
ah agent enroll - Leader establishes persistent SSH connections to followers
- Mutagen project started for file sync
- Commands executed over SSH
Firewall Considerations
If followers aren’t directly reachable:
- Use HTTP CONNECT via the access point
- For hybrid fleets, client relays between access points
Time-Travel Integration
Multi-OS testing integrates with Agent Harbor’s time-travel features:
sync-fencecreates a snapshot linked to a SessionMoment- Seeking to that moment restores the leader snapshot
- Followers are re-synced by issuing a fence before re-execution
This enables:
- Reproducing cross-platform failures
- Debugging platform-specific issues from any point in time
Failure Handling
Partial Host Failure
If some hosts fail:
- Results aggregated with per-host status
- Non-zero exit code if any host fails
- Detailed logs available per host
Sync Timeout
If sync fence times out:
- Lagging followers reported
- Suggestions to narrow selectors
- Option to proceed with available hosts
Connectivity Issues
If a follower becomes unreachable:
- Health probes detect degradation
- Automatic re-route attempts
- Host excluded if unrecoverable
Configuration
# .agents/config.toml
[fleet]
name = "production"
[fleet.sync]
# Patterns to ignore during sync
ignores = ["node_modules", ".venv", "target", "build"]
# Sync timeout
timeout = "60s"
[fleet.connectivity]
# SSH connection settings
ssh_connect_timeout = "5s"
ssh_keepalive = "30s"
# Health check interval
health_probe_interval = "30s"
# Consecutive failures before marking degraded
failure_threshold = 3
[fleet.execution]
# Maximum concurrent commands per host
concurrency = 1
# Command timeout
timeout = "10m"
# Retry on transient SSH errors
ssh_retry = true