Skip to Content
GuidesWorkflowsMulti-OS Testing

Multi-OS Testing

Validate builds and tests across multiple operating systems in parallel using Agent Harbor’s leader/follower architecture.

Overview

Multi-OS testing enables agents to run tests on Linux, macOS, and Windows simultaneously:

  • Leader: Primary executor (typically Linux) that owns filesystem snapshots and initiates sync
  • Followers: Secondary executors that mirror the leader via high-speed file sync
  • Sync Fence: Ensures all followers have identical file state before test execution
  • run-everywhere: Execute commands across all platforms and aggregate results

Architecture

┌─────────────────────────────────────────────────────────────┐ │ Agent works on Leader (Linux) │ │ - Edits files │ │ - Creates filesystem snapshots │ └─────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────┐ │ Sync Fence │ │ - Snapshot captured │ │ - Changes synced to all followers via Mutagen │ └─────────────────────────────────────────────────────────────┘ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌───────────────────┐ ┌───────────────┐ ┌───────────────────┐ │ Linux Follower │ │ macOS Follower│ │ Windows Follower │ │ pytest -q │ │ pytest -q │ │ pytest -q │ └───────────────────┘ └───────────────┘ └───────────────────┘ │ │ │ └───────────────┼───────────────┘ ┌─────────────────────────────────────────────────────────────┐ │ Results aggregated and returned to agent │ └─────────────────────────────────────────────────────────────┘

Fleet Configuration

Define your fleet of executors in the configuration:

# .agents/config.toml [fleet] name = "cross-platform" [[fleet.members]] name = "linux-leader" role = "leader" host = "linux.internal" ssh_user = "agent" [[fleet.members]] name = "macos-dev" role = "follower" host = "mac.internal" ssh_user = "agent" tags = { os = "macos", arch = "arm64" } [[fleet.members]] name = "windows-build" role = "follower" host = "win.internal" ssh_user = "agent" tags = { os = "windows", arch = "x64" } [[fleet.members]] name = "linux-arm" role = "follower" host = "arm.internal" ssh_user = "agent" tags = { os = "linux", arch = "arm64" }

Running Cross-Platform Tasks

Basic Usage

ah task create --agent claude --fleet cross-platform --prompt "Add feature X and ensure tests pass on all platforms"

Manual Cross-Platform Commands

Run commands on specific platforms from the leader:

# Run on all followers ah agent followers run -- npm test # Run only on Windows ah agent followers run --tag os=windows -- npm test # Run only on ARM platforms ah agent followers run --tag arch=arm64 -- cargo test # Run on a specific host ah agent followers run --host macos-dev -- swift test

Execution Workflow

1. Agent Edits Files (Leader)

The agent makes changes on the leader workspace.

2. Sync Fence

Before running cross-platform tests, the agent calls:

ah agent followers sync-fence

This:

  1. Creates a filesystem snapshot on the leader
  2. Syncs all changes to followers via Mutagen
  3. Waits until all followers confirm sync completion

3. Run Tests Everywhere

ah agent followers run -- pytest -q

This:

  1. Executes the command on all followers in parallel
  2. Adapts the command for each OS (shell, paths, quoting)
  3. Aggregates exit codes and output
  4. Returns combined results to the agent

4. Fetch Build Artifacts

Collect platform-specific build outputs:

ah agent followers slurp \ --fileset-data "name=binaries&include=dist/*&dest=./artifacts"

Command Adaptation

Agent Harbor automatically adapts commands for each platform:

PlatformShellPath TranslationExample
Linuxbash -lcNativecd /workspaces/proj && pytest
macOSzsh -lcFSKit mountcd /Volumes/ah-overlays/proj && pytest
WindowsPowerShellDrive mappingSet-Location S:\; pytest

Targeting and Selectors

By Tag

# Run on all Windows hosts ah agent followers run --tag os=windows -- npm test # Run on ARM64 macOS ah agent followers run --tag os=macos --tag arch=arm64 -- swift test

By Host

ah agent followers run --host win-12 -- dotnet test

Run Everywhere

ah agent followers run --all -- make test

Test Sharding

Accelerate test execution by distributing tests across followers:

ah agent followers run \ --command "pytest tests/unit/" \ --command "pytest tests/integration/" \ --command "pytest tests/e2e/"

The scheduler:

  1. Builds a job queue from commands
  2. Assigns tasks round-robin across executors
  3. Each command runs on exactly one executor per OS
  4. Results collected after all complete

Fetching Artifacts

Single File Set

ah agent followers slurp \ --fileset-data "name=binaries&include=dist/*&dest=./artifacts"

Multiple File Sets

ah agent followers slurp \ --fileset-data "name=windows-exe&include=target/release/*.exe&required=true" \ --fileset-data "name=mac-app&include=target/release/*.app/**&required=true" \ --fileset-data "name=linux-bin&include=target/release/myapp&required=true" \ --dest ./release-artifacts

From Configuration

# .agents/config.toml [filesets.release] include = ["target/release/*"] exclude = ["*.d", "*.rlib"] required = true
ah agent followers slurp --fileset release --dest ./release

Connectivity

Transport Layers

  • Control Plane: QUIC connections between executors and access points
  • Data Plane: SSH tunneled via HTTP CONNECT (or direct if reachable)
  • Sync: Mutagen over SSH for file synchronization

Connection Flow

  1. Executors register with access point via ah agent enroll
  2. Leader establishes persistent SSH connections to followers
  3. Mutagen project started for file sync
  4. Commands executed over SSH

Firewall Considerations

If followers aren’t directly reachable:

  • Use HTTP CONNECT via the access point
  • For hybrid fleets, client relays between access points

Time-Travel Integration

Multi-OS testing integrates with Agent Harbor’s time-travel features:

  1. sync-fence creates a snapshot linked to a SessionMoment
  2. Seeking to that moment restores the leader snapshot
  3. Followers are re-synced by issuing a fence before re-execution

This enables:

  • Reproducing cross-platform failures
  • Debugging platform-specific issues from any point in time

Failure Handling

Partial Host Failure

If some hosts fail:

  • Results aggregated with per-host status
  • Non-zero exit code if any host fails
  • Detailed logs available per host

Sync Timeout

If sync fence times out:

  • Lagging followers reported
  • Suggestions to narrow selectors
  • Option to proceed with available hosts

Connectivity Issues

If a follower becomes unreachable:

  • Health probes detect degradation
  • Automatic re-route attempts
  • Host excluded if unrecoverable

Configuration

# .agents/config.toml [fleet] name = "production" [fleet.sync] # Patterns to ignore during sync ignores = ["node_modules", ".venv", "target", "build"] # Sync timeout timeout = "60s" [fleet.connectivity] # SSH connection settings ssh_connect_timeout = "5s" ssh_keepalive = "30s" # Health check interval health_probe_interval = "30s" # Consecutive failures before marking degraded failure_threshold = 3 [fleet.execution] # Maximum concurrent commands per host concurrency = 1 # Command timeout timeout = "10m" # Retry on transient SSH errors ssh_retry = true