Skip to Content
GuidesWorkflowsEnrollmentEnrollment Troubleshooting Guide

Enrollment Troubleshooting Guide

This guide covers common issues encountered during executor enrollment and their solutions.

Quick Diagnostics

Check Enrollment Status

# View executor enrollment state ah agent status # Check access point logs journalctl -u ah-access-point -f # Check executor logs journalctl -u ah-executor -f

Verify Connectivity

# Test network connectivity to access point nc -zv access-point.example.com 4433 # Test TLS handshake openssl s_client -connect access-point.example.com:4433 -showcerts

Inspect Certificates

# View certificate details openssl x509 -in cert.pem -noout -text # Check certificate dates openssl x509 -in cert.pem -noout -dates # Verify certificate chain openssl verify -CAfile ca.pem cert.pem

Connection Errors

Connection Refused

error: connection refused to access-point.example.com:4433

Possible Causes:

  1. Access point not running
  2. Firewall blocking port 4433
  3. Incorrect address or port

Solutions:

# Check if access point is running systemctl status ah-access-point # Check listening ports ss -tlnp | grep 4433 # Test firewall sudo iptables -L -n | grep 4433 # Open firewall (Linux) sudo firewall-cmd --add-port=4433/tcp --permanent sudo firewall-cmd --reload

Connection Timeout

error: connection timed out after 30s

Possible Causes:

  1. Network routing issues
  2. Intermediate firewall dropping packets
  3. DNS resolution problems

Solutions:

# Test DNS resolution dig access-point.example.com # Test routing traceroute access-point.example.com # Try connecting by IP ah agent enroll --remote-server https://192.168.1.100:4433 ...

TLS Handshake Failed

error: tls handshake failed: certificate verify failed

Possible Causes:

  1. CA certificate mismatch
  2. Certificate expired
  3. Certificate not yet valid (clock skew)

Solutions:

# Check system time date timedatectl status # Sync time sudo systemctl start systemd-timesyncd # Verify CA matches server certificate openssl verify -CAfile ca.pem server-cert.pem # Check certificate dates openssl x509 -in server-cert.pem -noout -dates

Certificate Errors

Certificate Expired

error: certificate has expired or is not yet valid

Diagnosis:

# Check certificate expiry openssl x509 -in cert.pem -noout -enddate # Compare with current time date -u

Solutions:

  1. Files provider: Generate new certificates

    # See files provider guide for certificate generation
  2. SPIFFE provider: Check SPIRE agent

    spire-agent healthcheck -socketPath /run/spire/agent.sock
  3. Vault provider: Check Vault connectivity

    vault token lookup

Certificate Chain Incomplete

error: unable to get local issuer certificate

Diagnosis:

# View certificate chain openssl s_client -connect access-point:4433 -showcerts # Check CA file contents openssl x509 -in ca.pem -noout -subject -issuer

Solutions:

# Concatenate intermediate and root CAs cat intermediate-ca.pem root-ca.pem > ca-chain.pem # Use complete chain ah agent enroll --ca ca-chain.pem ...

Wrong Key for Certificate

error: private key does not match certificate

Diagnosis:

# Compare certificate and key modulus openssl x509 -in cert.pem -noout -modulus | md5sum openssl rsa -in key.pem -noout -modulus | md5sum # These should match

Solution: Regenerate certificate and key pair together.

Certificate SAN Mismatch

error: certificate SAN does not match expected pattern

Diagnosis:

# View certificate SANs openssl x509 -in cert.pem -noout -text | grep -A1 "Subject Alternative Name"

Solutions:

  1. Regenerate certificate with correct SANs
  2. Update access point --executor-san-uri-prefix to match
  3. For SPIFFE, ensure registration entry uses correct SPIFFE ID

Identity Provider Errors

Files Provider

Permission Denied

error: permission denied reading /etc/ah/key.pem

Solution:

# Fix permissions sudo chown agent-harbor:agent-harbor /etc/ah/*.pem sudo chmod 600 /etc/ah/*-key.pem sudo chmod 644 /etc/ah/cert.pem /etc/ah/ca.pem

File Not Found

error: no such file: /etc/ah/cert.pem

Solution: Verify paths and file existence:

ls -la /etc/ah/

SPIFFE Provider

No SVID Issued

error: no identity issued

Diagnosis:

# Check agent health spire-agent healthcheck -socketPath /run/spire/agent.sock # List available SVIDs spire-agent api fetch x509 -socketPath /run/spire/agent.sock -write /tmp/svid # Check registration entries spire-server entry show -socketPath /run/spire/server.sock

Solutions:

  1. Create registration entry:

    spire-server entry create \ -socketPath /run/spire/server.sock \ -parentID "spiffe://example.org/spire/agent/join_token/agent-1" \ -spiffeID "spiffe://example.org/ah/agent/executor-1" \ -selector "unix:user:executor"
  2. Fix selector mismatch:

    # Check process UID/GID id # Verify selector matches spire-server entry show -socketPath /run/spire/server.sock | grep selector

SPIFFE Socket Not Found

error: failed to connect to Workload API: /run/spire/agent.sock: no such file

Solutions:

# Check SPIRE agent is running systemctl status spire-agent # Verify socket path ls -la /run/spire/ # Check socket permissions stat /run/spire/agent.sock

SPIFFE ID Mismatch

error: server SPIFFE ID mismatch: expected spiffe://example.org/ah/serve, got spiffe://other.org/ah/serve

Solutions:

  1. Verify --expected-server-id matches access point’s SPIFFE ID
  2. Check trust domain configuration on both sides
  3. Verify registration entry for access point

Vault Provider

Authentication Failed

error: vault authentication failed: permission denied

Diagnosis:

# Test Vault authentication vault login -method=approle \ role_id=$VAULT_ROLE_ID \ secret_id=$VAULT_SECRET_ID

Solutions:

  1. Verify role ID and secret ID are correct
  2. Check secret ID hasn’t expired
  3. Verify AppRole is enabled: vault auth list

PKI Issue Failed

error: failed to issue certificate: 1 error occurred: * common name not allowed

Diagnosis:

# Check PKI role configuration vault read pki_int/roles/executor

Solutions:

# Update allowed domains vault write pki_int/roles/executor \ allowed_domains="executor.example.com,internal.example.com" \ allow_subdomains=true

Vault Sealed

error: vault is sealed

Solution: Unseal Vault:

vault operator unseal <key1> vault operator unseal <key2> vault operator unseal <key3>

mTLS Errors

Client Certificate Required

error: client certificate required

Diagnosis: Access point requires client certificate but executor isn’t providing one.

Solutions:

  1. Verify executor identity provider is configured
  2. Check certificate is being loaded:
    ah agent enroll --identity files --cert cert.pem --key key.pem --ca ca.pem ...

Client Certificate Rejected

error: client certificate rejected: certificate signed by unknown authority

Solutions:

  1. Access point must trust executor’s CA:

    ah agent access-point --ca /path/to/executor-ca.pem ...
  2. Or use same CA for both access point and executors

Server Certificate Rejected

error: x509: certificate signed by unknown authority

Solutions:

  1. Executor must trust access point’s CA:

    ah agent enroll --ca /path/to/access-point-ca.pem ...
  2. For SPIFFE, the CA is provided by SPIRE automatically

Rotation Issues

Rotation Not Happening

Diagnosis:

# Check certificate expiry ah agent status --show-cert # Check for rotation logs journalctl -u ah-executor | grep -i "rotat\|renew"

Solutions:

  1. Files provider: Ensure file watching is working

    # Trigger inotify event touch /etc/ah/cert.pem
  2. SPIFFE provider: Check SPIRE agent health

    spire-agent healthcheck -socketPath /run/spire/agent.sock
  3. Vault provider: Check Vault token is valid

    vault token lookup

Connection Drops During Rotation

Possible Causes:

  1. Rotation happens too late (near expiry)
  2. Server doesn’t accept new certificate

Solutions:

  1. Configure earlier rotation threshold:

    ah agent enroll --vault-renewal-threshold 0.5 ... # Renew at 50% TTL
  2. Use longer certificate TTLs to allow more time for rotation

Debugging Tools

Enable Debug Logging

# Access point ah agent access-point --log-level debug ... # Executor ah agent enroll --log-level debug ... # Or via environment export RUST_LOG=ah_identity_provider=debug,ah_cli=debug

Capture TLS Traffic

# Capture with tcpdump sudo tcpdump -i any -w enrollment.pcap port 4433 # Analyze with Wireshark wireshark enrollment.pcap

Test Certificate Chain

# Full certificate validation openssl s_client -connect access-point:4433 \ -cert executor.pem \ -key executor-key.pem \ -CAfile ca.pem \ -verify_return_error

SPIRE Debugging

# Agent debug info spire-agent api fetch x509 \ -socketPath /run/spire/agent.sock \ -write /tmp/svid # Detailed SVID info openssl x509 -in /tmp/svid.0.pem -noout -text # Server-side registration check spire-server entry show -socketPath /run/spire/server.sock # Agent list spire-server agent list -socketPath /run/spire/server.sock

Common Patterns

Development Setup Failing

For quick local development:

# Use dev identity (self-signed) ah agent access-point --fleet-listen 127.0.0.1:4433 # Executor with dev identity ah agent enroll --remote-server https://127.0.0.1:4433 --name test

Production Checklist

Before deploying to production, verify:

  • Certificates are issued by trusted CA
  • Certificate TTLs are appropriately short (hours, not days)
  • Certificate rotation is working
  • Access point validates executor certificate SANs
  • Firewall allows port 4433
  • Audit logging is enabled
  • Monitoring alerts for certificate expiry

Getting Help

If you’re still experiencing issues:

  1. Collect debug logs from both access point and executor
  2. Capture certificate details with openssl x509 -noout -text
  3. Note the exact error message
  4. Open an issue at https://github.com/schelling-point-labs/agent-harbor/issues 

Next Steps