Enrollment Troubleshooting Guide
This guide covers common issues encountered during executor enrollment and their solutions.
Quick Diagnostics
Check Enrollment Status
# View executor enrollment state
ah agent status
# Check access point logs
journalctl -u ah-access-point -f
# Check executor logs
journalctl -u ah-executor -fVerify Connectivity
# Test network connectivity to access point
nc -zv access-point.example.com 4433
# Test TLS handshake
openssl s_client -connect access-point.example.com:4433 -showcertsInspect Certificates
# View certificate details
openssl x509 -in cert.pem -noout -text
# Check certificate dates
openssl x509 -in cert.pem -noout -dates
# Verify certificate chain
openssl verify -CAfile ca.pem cert.pemConnection Errors
Connection Refused
error: connection refused to access-point.example.com:4433Possible Causes:
- Access point not running
- Firewall blocking port 4433
- Incorrect address or port
Solutions:
# Check if access point is running
systemctl status ah-access-point
# Check listening ports
ss -tlnp | grep 4433
# Test firewall
sudo iptables -L -n | grep 4433
# Open firewall (Linux)
sudo firewall-cmd --add-port=4433/tcp --permanent
sudo firewall-cmd --reloadConnection Timeout
error: connection timed out after 30sPossible Causes:
- Network routing issues
- Intermediate firewall dropping packets
- DNS resolution problems
Solutions:
# Test DNS resolution
dig access-point.example.com
# Test routing
traceroute access-point.example.com
# Try connecting by IP
ah agent enroll --remote-server https://192.168.1.100:4433 ...TLS Handshake Failed
error: tls handshake failed: certificate verify failedPossible Causes:
- CA certificate mismatch
- Certificate expired
- Certificate not yet valid (clock skew)
Solutions:
# Check system time
date
timedatectl status
# Sync time
sudo systemctl start systemd-timesyncd
# Verify CA matches server certificate
openssl verify -CAfile ca.pem server-cert.pem
# Check certificate dates
openssl x509 -in server-cert.pem -noout -datesCertificate Errors
Certificate Expired
error: certificate has expired or is not yet validDiagnosis:
# Check certificate expiry
openssl x509 -in cert.pem -noout -enddate
# Compare with current time
date -uSolutions:
-
Files provider: Generate new certificates
# See files provider guide for certificate generation -
SPIFFE provider: Check SPIRE agent
spire-agent healthcheck -socketPath /run/spire/agent.sock -
Vault provider: Check Vault connectivity
vault token lookup
Certificate Chain Incomplete
error: unable to get local issuer certificateDiagnosis:
# View certificate chain
openssl s_client -connect access-point:4433 -showcerts
# Check CA file contents
openssl x509 -in ca.pem -noout -subject -issuerSolutions:
# Concatenate intermediate and root CAs
cat intermediate-ca.pem root-ca.pem > ca-chain.pem
# Use complete chain
ah agent enroll --ca ca-chain.pem ...Wrong Key for Certificate
error: private key does not match certificateDiagnosis:
# Compare certificate and key modulus
openssl x509 -in cert.pem -noout -modulus | md5sum
openssl rsa -in key.pem -noout -modulus | md5sum
# These should matchSolution: Regenerate certificate and key pair together.
Certificate SAN Mismatch
error: certificate SAN does not match expected patternDiagnosis:
# View certificate SANs
openssl x509 -in cert.pem -noout -text | grep -A1 "Subject Alternative Name"Solutions:
- Regenerate certificate with correct SANs
- Update access point
--executor-san-uri-prefixto match - For SPIFFE, ensure registration entry uses correct SPIFFE ID
Identity Provider Errors
Files Provider
Permission Denied
error: permission denied reading /etc/ah/key.pemSolution:
# Fix permissions
sudo chown agent-harbor:agent-harbor /etc/ah/*.pem
sudo chmod 600 /etc/ah/*-key.pem
sudo chmod 644 /etc/ah/cert.pem /etc/ah/ca.pemFile Not Found
error: no such file: /etc/ah/cert.pemSolution: Verify paths and file existence:
ls -la /etc/ah/SPIFFE Provider
No SVID Issued
error: no identity issuedDiagnosis:
# Check agent health
spire-agent healthcheck -socketPath /run/spire/agent.sock
# List available SVIDs
spire-agent api fetch x509 -socketPath /run/spire/agent.sock -write /tmp/svid
# Check registration entries
spire-server entry show -socketPath /run/spire/server.sockSolutions:
-
Create registration entry:
spire-server entry create \ -socketPath /run/spire/server.sock \ -parentID "spiffe://example.org/spire/agent/join_token/agent-1" \ -spiffeID "spiffe://example.org/ah/agent/executor-1" \ -selector "unix:user:executor" -
Fix selector mismatch:
# Check process UID/GID id # Verify selector matches spire-server entry show -socketPath /run/spire/server.sock | grep selector
SPIFFE Socket Not Found
error: failed to connect to Workload API: /run/spire/agent.sock: no such fileSolutions:
# Check SPIRE agent is running
systemctl status spire-agent
# Verify socket path
ls -la /run/spire/
# Check socket permissions
stat /run/spire/agent.sockSPIFFE ID Mismatch
error: server SPIFFE ID mismatch: expected spiffe://example.org/ah/serve, got spiffe://other.org/ah/serveSolutions:
- Verify
--expected-server-idmatches access point’s SPIFFE ID - Check trust domain configuration on both sides
- Verify registration entry for access point
Vault Provider
Authentication Failed
error: vault authentication failed: permission deniedDiagnosis:
# Test Vault authentication
vault login -method=approle \
role_id=$VAULT_ROLE_ID \
secret_id=$VAULT_SECRET_IDSolutions:
- Verify role ID and secret ID are correct
- Check secret ID hasn’t expired
- Verify AppRole is enabled:
vault auth list
PKI Issue Failed
error: failed to issue certificate: 1 error occurred: * common name not allowedDiagnosis:
# Check PKI role configuration
vault read pki_int/roles/executorSolutions:
# Update allowed domains
vault write pki_int/roles/executor \
allowed_domains="executor.example.com,internal.example.com" \
allow_subdomains=trueVault Sealed
error: vault is sealedSolution: Unseal Vault:
vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>mTLS Errors
Client Certificate Required
error: client certificate requiredDiagnosis: Access point requires client certificate but executor isn’t providing one.
Solutions:
- Verify executor identity provider is configured
- Check certificate is being loaded:
ah agent enroll --identity files --cert cert.pem --key key.pem --ca ca.pem ...
Client Certificate Rejected
error: client certificate rejected: certificate signed by unknown authoritySolutions:
-
Access point must trust executor’s CA:
ah agent access-point --ca /path/to/executor-ca.pem ... -
Or use same CA for both access point and executors
Server Certificate Rejected
error: x509: certificate signed by unknown authoritySolutions:
-
Executor must trust access point’s CA:
ah agent enroll --ca /path/to/access-point-ca.pem ... -
For SPIFFE, the CA is provided by SPIRE automatically
Rotation Issues
Rotation Not Happening
Diagnosis:
# Check certificate expiry
ah agent status --show-cert
# Check for rotation logs
journalctl -u ah-executor | grep -i "rotat\|renew"Solutions:
-
Files provider: Ensure file watching is working
# Trigger inotify event touch /etc/ah/cert.pem -
SPIFFE provider: Check SPIRE agent health
spire-agent healthcheck -socketPath /run/spire/agent.sock -
Vault provider: Check Vault token is valid
vault token lookup
Connection Drops During Rotation
Possible Causes:
- Rotation happens too late (near expiry)
- Server doesn’t accept new certificate
Solutions:
-
Configure earlier rotation threshold:
ah agent enroll --vault-renewal-threshold 0.5 ... # Renew at 50% TTL -
Use longer certificate TTLs to allow more time for rotation
Debugging Tools
Enable Debug Logging
# Access point
ah agent access-point --log-level debug ...
# Executor
ah agent enroll --log-level debug ...
# Or via environment
export RUST_LOG=ah_identity_provider=debug,ah_cli=debugCapture TLS Traffic
# Capture with tcpdump
sudo tcpdump -i any -w enrollment.pcap port 4433
# Analyze with Wireshark
wireshark enrollment.pcapTest Certificate Chain
# Full certificate validation
openssl s_client -connect access-point:4433 \
-cert executor.pem \
-key executor-key.pem \
-CAfile ca.pem \
-verify_return_errorSPIRE Debugging
# Agent debug info
spire-agent api fetch x509 \
-socketPath /run/spire/agent.sock \
-write /tmp/svid
# Detailed SVID info
openssl x509 -in /tmp/svid.0.pem -noout -text
# Server-side registration check
spire-server entry show -socketPath /run/spire/server.sock
# Agent list
spire-server agent list -socketPath /run/spire/server.sockCommon Patterns
Development Setup Failing
For quick local development:
# Use dev identity (self-signed)
ah agent access-point --fleet-listen 127.0.0.1:4433
# Executor with dev identity
ah agent enroll --remote-server https://127.0.0.1:4433 --name testProduction Checklist
Before deploying to production, verify:
- Certificates are issued by trusted CA
- Certificate TTLs are appropriately short (hours, not days)
- Certificate rotation is working
- Access point validates executor certificate SANs
- Firewall allows port 4433
- Audit logging is enabled
- Monitoring alerts for certificate expiry
Getting Help
If you’re still experiencing issues:
- Collect debug logs from both access point and executor
- Capture certificate details with
openssl x509 -noout -text - Note the exact error message
- Open an issue at https://github.com/schelling-point-labs/agent-harbor/issues
Next Steps
- Files Provider Guide - Manual PKI setup
- SPIFFE Deployment Guide - SPIRE configuration
- Vault Integration Guide - Enterprise PKI