Skip to content

Agent

Reference for the nefia-agent binary, configuration, and lifecycle management.

The nefia-agent binary runs on target PCs and establishes a WireGuard VPN tunnel back to the operator. Once enrolled, it maintains a persistent connection, forwards SSH traffic, and monitors its own health automatically.

Supported Platforms

OSArchitectureService Manager
macOS 15+amd64, arm64launchd
Linux (Ubuntu 22.04+, Debian 12+, Fedora 38+)amd64, arm64systemd
Windows 11+amd64, arm64Scheduled task via schtasks

Installation

bash
curl -fsSL https://www.nefia.ai/install-agent.sh | sh -s -- --token '<INVITE_TOKEN>'

Downloads the latest binary, verifies its SHA-256 checksum, installs to /usr/local/bin/nefia-agent, enrolls the agent, and registers a system service.

Agent Binary Commands

The nefia-agent binary provides the following subcommands:

CommandDescription
nefia-agent enroll --token <TOKEN>Enroll with an operator hub using an invite token
nefia-agent run --config <PATH>Run the agent in the foreground (for debugging or service invocation)
nefia-agent install-service --config <PATH>Install the agent as a system service
nefia-agent uninstall-serviceRemove the system service
nefia-agent statusShow agent health status from the last health report
nefia-agent wol --mac <MAC>Send a Wake-on-LAN magic packet from the local network
nefia-agent --versionPrint version, commit, and build date, then exit

Global Flags

FlagDefaultDescription
--config--Path to agent config file
--log-levelinfoLog level: debug, info, warn, error

Enrollment

Enrollment pairs the agent with an operator using a single-use, HMAC-SHA256 signed invite token. For cloud relay enrollment, agent tokens are issued via a 2-phase bootstrap code exchange to prevent token replay if the enrollment response is intercepted.

Enrollment Flow

1

Generate an invite token on the operator PC:

bash
nefia vpn invite --name staging-server --os macos --stun

When logged in (nefia login), this also creates a cloud enrollment session with an expected_host_id parameter that binds the session to a specific agent.

2

Run the installer with the token on the target PC:

bash
curl -fsSL https://www.nefia.ai/install-agent.sh | sh -s -- --token '<INVITE_TOKEN>'
nefia-agent enroll --token eyJh... --install --yes

Connecting to operator at 203.0.113.10:19820... Direct enrollment succeeded. Validating invite token... OK Key exchange... OK VPN tunnel established 10.99.0.5 SSH host key registered SHA256:bE4f...kQ9w Enrollment successful! Config written to /Users/admin/.nefia/agent.yaml Service installed.

2-Phase Bootstrap Code Exchange (Cloud Relay)

When enrollment uses the cloud relay, agent tokens are not issued directly. Instead, the operator receives a short-lived bootstrap code and exchanges it for the actual agent token in a separate API call. This happens automatically within the operator CLI.

1

Phase 1 -- Enrollment completes. The agent submits its WireGuard public key and endpoint to the cloud relay. When the operator polls the enrollment session result (GET /api/enrollment/sessions/[nonce]/result), the server returns a bootstrap code (HS256-signed JWT, 5-minute TTL) instead of the agent token. The bootstrap code embeds the session nonce, team ID, and host ID.

2

Phase 2 -- Bootstrap exchange. The operator calls POST /api/agent-tokens/bootstrap/exchange with the bootstrap code and nonce. The server verifies the JWT signature, checks that the host_id in the token matches the enrollment session's agentHostID, marks the session as consumed, and returns the actual agent token. This exchange can only succeed once -- replaying the same bootstrap code returns an error.

Security properties:

  • Short-lived: The bootstrap code expires after 5 minutes, limiting the replay window.
  • Single-use: The enrollment session transitions from completed to consumed atomically. A second exchange attempt is rejected.
  • Host-bound: The expected_host_id set during session creation must match the agent that actually completed enrollment, preventing a different agent from claiming the token.
  • Transparent to users: nefia-agent enroll and the operator CLI handle both phases automatically. The user experience is unchanged.

Enroll Flags

FlagDescription
--tokenInvite token string (mutually exclusive with --token-file)
--token-filePath to a file containing the invite token
--out-dirDirectory to write the generated agent config
--ssh-addrLocal SSH address to forward to (default: 127.0.0.1:22)
--forceOverwrite existing agent config
--yesSkip interactive confirmation
--installAutomatically install the system service after enrollment

Cloud Relay Fallback

When the agent cannot reach the operator directly (NAT, CGNAT, firewall), enrollment automatically falls back to the cloud relay:

plaintext
Agent → HTTPS → nefia.ai (relay) ← HTTPS ← Operator

The fallback is transparent -- nefia-agent enroll first tries a direct TCP connection (10 second timeout), and if that fails, uses the cloud relay via nefia.ai. Only WireGuard public keys and metadata are exchanged through the relay; private keys never leave the device.

nefia-agent enroll --token eyJh... (cloud relay)

Connecting to operator at 203.0.113.10:19820... Direct connection failed. Trying cloud relay... Cloud relay enrollment succeeded. VPN tunnel established 10.99.0.5

Enrollment complete.

Enrollment Error Messages

When enrollment fails, the agent receives a categorized error message to help diagnose the issue:

ErrorCauseResolution
token has expiredThe invite token's TTL has elapsedGenerate a new token with nefia vpn reinvite
token has already been used (nonce reused)The token was already used for enrollmentGenerate a fresh token -- each token is single-use
invalid token signatureThe token was corrupted or tampered withCopy the full token string again from the operator
validation failedOther validation errorCheck the operator logs for details
enrollment nonce must not be emptyThe token's nonce field is missing or emptyGenerate a new token -- this indicates a corrupted or hand-crafted token
too many enrollment requestsIP-based rate limiting triggeredWait 30 seconds and retry
Invalid bootstrap codeThe bootstrap code JWT is expired, tampered with, or the host ID does not matchRe-run enrollment -- the bootstrap code has a 5-minute TTL
Bootstrap code has already been exchangedThe bootstrap code was already used to obtain the agent tokenGenerate a new invite token and re-enroll
Enrollment session is missing agent_host_idThe agent did not report its host ID during enrollmentCheck agent logs for enrollment errors and retry

Enrollment Response Size Limit

Enrollment API responses are limited to 1 MB (MaxEnrollResponseBytes). Responses exceeding this limit are rejected to prevent memory exhaustion from malicious or malformed relay responses.

Rate Limiter Behavior

The per-IP rate limiter uses deterministic eviction when the limiter table is full. It evicts the oldest entry by lastSeen timestamp, preferring entries that still have available tokens over those that are currently rate-limited. This ensures that active, well-behaved clients are not evicted in favor of rate-limited ones.

Configuration

The agent stores its YAML configuration at:

  • All platforms: ~/.nefia/agent.yaml (i.e. $HOME/.nefia/agent.yaml on macOS/Linux, %USERPROFILE%\.nefia\agent.yaml on Windows)
FieldTypeDefaultDescription
private_keystring--WireGuard private key (base64-encoded, or keyring: for OS keyring storage)
listen_portint51820UDP listen port for WireGuard
addressstring--VPN IP with CIDR notation (e.g. 10.99.0.2/24)
dnsstring[]--DNS servers reachable inside VPN (e.g. ["10.99.0.1"])
ssh_addrstring127.0.0.1:22Local SSH address to forward tunnel connections to
peersPeerEntry[]--Array of operator peer entries (see below)
update_server_urlstring--Remote update server URL (empty = no remote updates)
agent_tokenstring--API token for cloud communication (from enrollment). Supports keyring:agent-token prefix for OS keyring storage.
agent_token_expires_atstring--RFC 3339 expiry time for the agent token (e.g., 2026-06-01T00:00:00Z).
host_idstring--Host ID from enrollment, used for endpoint reporting.
stun_serversstring[]--Custom STUN servers for endpoint discovery (overrides defaults). Format: host:port.
endpoint_refreshduration--STUN refresh interval (e.g., 5m). Empty or 0 disables periodic endpoint refresh.
cloud_api_basestring--Cloud API base URL for endpoint reporting and token rotation. Must be https://.
derp_serversDERPServer[]--DERP relay servers for relay-first connectivity. Propagated from operator config during enrollment.
derp_auto_selectbooleantrueEnable RTT-based automatic DERP server selection when multiple relays are configured.
derp_probe_intervalduration5mDERP RTT probe interval. Must be at least 30s.
beaconsobject--Agent-side event monitoring configuration. See below.

Beacon Configuration

The beacons section configures agent-side event monitoring. Events are sent to the operator's reactor via TCP.

yaml
beacons:
  disk_usage:
    enabled: true
    interval: "5m"
    warning_percent: 80
    critical_percent: 90
    paths:
      - "/"
      - "/home"
  service_check:
    enabled: true
    interval: "30s"
    services:
      - nginx
      - postgresql
  scripts:
    - name: "custom-check"
      command: "/opt/scripts/check.sh"
      interval: "10m"
BeaconFieldsDescription
disk_usageenabled, interval, warning_percent, critical_percent, pathsMonitor disk usage on specified mount points.
service_checkenabled, interval, servicesMonitor systemd service status (Linux only).
scriptsname, command, intervalRun custom scripts. Exit 0 = OK, 1 = warning, 2+ = critical.

Peer Entry Fields

Each entry in the peers array configures a WireGuard peer (operator):

FieldDescription
public_keyBase64-encoded WireGuard public key of the operator
endpointUDP endpoint (ip:port); empty for incoming-only peers
vpn_addrPeer's VPN IP address (e.g. 10.99.0.1)

Config Validation

On startup, the agent runs validateAgentConfig() to check all required fields:

  • private_key: Must be present (plaintext base64 or keyring: reference).
  • address: Must be a valid CIDR notation (e.g., 10.99.0.2/24).
  • peers: At least one peer must be configured with a valid public key and VPN address.
  • listen_port: Must be a valid port number (0--65535).
  • derp_probe_interval: When set, must parse as a Go duration and be at least 30 seconds.

Each validation failure produces a clear error message. The agent exits immediately on validation failure.

Agent Token

Each enrolled agent receives a unique API token for cloud communication (endpoint reporting, token rotation). This token is separate from the operator's OAuth tokens and the WireGuard key pair.

Token Generation

Agent tokens are 256-bit values generated with crypto.randomBytes(32). Only the SHA-256 hash is stored server-side as an ApiToken record -- the plaintext token never persists on the server.

For direct enrollment, the operator generates the token locally and delivers it to the agent over the authenticated WireGuard tunnel.

For cloud relay enrollment, the token is issued via a 2-phase bootstrap code exchange (see 2-Phase Bootstrap Code Exchange). The server issues a short-lived bootstrap code (JWT, 5-minute TTL) when enrollment completes. The operator exchanges this bootstrap code for the actual agent token via POST /api/agent-tokens/bootstrap/exchange. The token is then forwarded to the agent over the VPN tunnel.

Token Storage

The agent stores its token in the OS keyring:

PlatformKeyring Backend
macOSKeychain (nefia-agent-token)
LinuxSecret Service API (GNOME Keyring, KWallet)
WindowsCredential Manager (nefia-agent-token)

When no keyring is available (headless servers, containers), the token is encrypted with AES-256-GCM and saved to a fallback file alongside the agent config.

Automatic Rotation

Agent tokens have a 90-day TTL. The agent checks the remaining TTL every 24 hours. When the TTL drops below 7 days, the agent initiates rotation:

  1. Calls POST /api/agent-tokens/rotate with its current token.
  2. The server verifies the current token, generates a new token, and revokes the old one atomically.
  3. The agent stores the new token in the keyring, replacing the previous value.

If rotation fails (network error, server unavailable), the agent retries on the next 24-hour cycle. The existing token remains valid until its TTL expires.

Endpoint Reporting

The agent periodically reports its public endpoint to the cloud API so operators can reach it even when IP addresses change:

  • API: POST /api/hosts/[hostId]/endpoint (rate limited to 10 requests/minute)
  • Trigger: Endpoint is reported after STUN discovery during startup and whenever the network monitor detects an IP change.
  • Fields: The agent sends its current public endpoint, local endpoint (for hairpin NAT), and NAT type classification.

The host_id field in the agent config (set during enrollment) identifies which host record to update.

Log Rotation

The agent supports built-in log rotation via the following flags on nefia-agent run:

FlagTypeDefaultDescription
--log-filestringplatform defaultPath to the log file
--log-max-sizeint50Maximum log file size in MB before rotation
--log-max-backupsint5Number of rotated log files to retain
--log-max-ageint30Maximum days to retain rotated log files

Default log file locations:

PlatformPath
macOS~/Library/Logs/nefia/agent.log
Linux$XDG_STATE_HOME/nefia/agent.log (default: ~/.local/state/nefia/agent.log)
Windows%LOCALAPPDATA%\nefia\logs\agent.log

When the log file exceeds --log-max-size, it is rotated and compressed. Old files are pruned based on --log-max-backups and --log-max-age.

Run Command

Start the agent in the foreground:

bash
nefia-agent run --config <PATH> [flags]
FlagDescription
--strict-permissionsRefuse to start if config file has world-readable permissions (mode & 0o077 != 0) and contains a plaintext private key. Default: true. Use this in production to enforce secure key storage.
--auto-update-intervalInterval for checking binary changes and auto-restarting (0 = disabled, e.g. 5m).
--log-filePath to the log file (see Log Rotation).
--log-max-sizeMaximum log file size in MB before rotation (default: 50).
--log-max-backupsNumber of rotated log files to retain (default: 5).
--log-max-ageMaximum days to retain rotated log files (default: 30).

When registered as a system service, the service manager invokes nefia-agent run automatically.

On Windows, the agent runs an icacls check at startup and warns if Everyone or BUILTIN\Users has read access to the config file. On Unix, it checks file permissions (mode & 0o077) and warns if the config is readable by other users.

Service Management

Installing the Service

Register the agent as a system service so it starts on boot:

bash
nefia-agent install-service --config <PATH>

Uninstalling the Service

Remove the system service:

bash
nefia-agent uninstall-service [--yes]

The --yes flag skips the interactive confirmation prompt.

Platform Details

Registers a LaunchAgent plist at ~/Library/LaunchAgents/com.nefia.agent.plist. The agent runs as a user-level service (no root required). Logs are written to ~/Library/Logs/nefia/nefia-agent.{stdout,stderr}.log.

bash
launchctl print gui/$(id -u)/com.nefia.agent

Resource Limits

ConstantValueDescription
maxAgentConns100Maximum concurrent forwarded connections. New connections are rejected when the limit is reached.
drainTimeout30sMaximum time to wait for active connections to finish during graceful shutdown.
defaultMTU1420MTU used for the WireGuard netstack TUN device.

SSH Host Key Verification (TOFU)

The agent's SSH host key is registered automatically during enrollment. The operator uses Trust On First Use (TOFU) semantics:

  • First connection: The host key is accepted and persisted to the operator's known_hosts file.
  • Subsequent connections: The key must match. A changed key causes connection rejection with a MITM warning.
  • Concurrent access: A RWMutex protects the known_hosts file, allowing concurrent reads while serializing writes.

This is safe because VPN connections are already authenticated via WireGuard key pairs.

Hairpin NAT Fallback

During enrollment, the agent calls DiscoverLocalEndpoint() to detect its LAN IP address. This local endpoint is reported to the operator as a fallback for hairpin NAT scenarios where the public endpoint is unreachable from the same LAN.

The agent validates the discovered local IP against the public endpoint:

  • If no suitable local IP is found, the field is left empty.
  • If the local IP matches the public endpoint IP, the field is left empty (no fallback needed).
  • Otherwise, the LAN address is stored in host.vpn.local_endpoint in the operator's config.

When establishing a VPN connection, the operator tries the public endpoint first. If unreachable, it falls back to the local endpoint for same-LAN connectivity.

Lifecycle

1

Service start -- the system service manager launches nefia-agent run.

2

VPN connect -- establishes a WireGuard tunnel using the stored keypair and endpoint.

3

SSH forward -- begins forwarding VPN connections to the local SSH server (capped at 100 concurrent connections). The tunnel forwarding logic detects short writes (io.ErrShortWrite) during data relay, preventing silent data corruption when the destination connection cannot accept a full write.

4

Health monitor loop -- continuously checks tunnel health. On failure, rebuilds with exponential backoff (5s initial, 5m cap, 2x multiplier).

Health Monitoring and Auto-Update

The agent runs two background loops after connecting:

Health checks (watchdog) run every 60 seconds when healthy, and every 15 seconds when unhealthy (adaptive interval). If the VPN tunnel degrades, it is torn down and rebuilt with exponential backoff (5s initial, 5m cap, 2x multiplier). If all peers are stale for 2 consecutive monitoring cycles, a full tunnel rebuild is triggered. After 2 consecutive rebuild failures, the watchdog enters a 30-minute cooldown period before attempting again (it no longer gives up permanently). If peer status check fails (IPC error), all peers are treated as stale to trigger rebuild investigation. Results are reported to the operator via nefia vpn status, which shows a STALE indicator for peers with no recent handshake.

Network Monitor Integration

The agent runs a network monitor that watches for IP address changes on local interfaces (checked every 5 seconds). When a network change is detected:

  1. Added and removed IP addresses are logged
  2. The watchdog receives an immediate rebuild signal
  3. The tunnel is rebuilt without waiting for the next health check cycle

This provides near-instant recovery after Wi-Fi reconnection, network switches, or VPN toggles on the host system.

Auto-updates are disabled by default (--auto-update-interval 0). To enable them, pass a non-zero duration such as --auto-update-interval 6h to nefia-agent run. When enabled, the agent periodically checks for new releases, downloads the binary, verifies its Ed25519 signature, replaces it atomically, and restarts the service.

Live Config Reload (Unix)

On Unix systems (macOS and Linux), sending SIGHUP to the agent process triggers a config reload:

bash
kill -HUP $(pgrep nefia-agent)

Useful for applying key rotation or endpoint changes without service downtime. If the reload fails (e.g., invalid YAML or missing fields), the error is recorded in the health report's last_config_error field, making it easy to diagnose issues remotely via nefia-agent status or by reading agent-health.json.

SSH Log Suppression

SSH connection state changes are logged only on state transitions (reachable to unreachable and vice versa) to prevent log flooding.

Startup Summary

On startup, the agent logs a summary including peer count, auto-update status, config path, and keyring status.

Health Status

Check the agent's health from the command line:

bash
nefia-agent status

Reads agent-health.json from the config directory and displays the current health report.

nefia-agent status

Health Status: ok VPN Connected: true Uptime: 72h22m0s Version: v1.3.0 Rebuild Count: 2 Last Rebuild: 2026-02-28T08:12:00Z Goroutines: 18

If a config reload error has occurred, a Config Error line is shown as well.

Health Report Fields

The agent-health.json file contains the following fields:

FieldDescription
statusHealth status string (ok, degraded, unhealthy).
vpn_connectedWhether the VPN tunnel is currently connected.
uptime_secondsSeconds since the agent process started.
versionRunning agent version string.
rebuild_countTotal number of tunnel rebuilds since startup.
last_rebuildTimestamp of the most recent tunnel rebuild.
goroutinesCurrent number of Go goroutines (useful for leak detection).
last_config_errorLast SIGHUP config reload error message. Empty when the last reload succeeded.

Remote Management (Operator CLI)

The operator CLI provides two subcommands for managing agents on remote hosts:

nefia agent version

Show the agent version on remote hosts:

bash
nefia agent version [--target <selector>]

Defaults to --target all if no target is specified.

nefia agent version --target staging-server

HOST VERSION COMPATIBLE MESSAGE staging-server v1.3.0 yes up to date

nefia agent upgrade

Upgrade nefia-agent on remote hosts:

bash
nefia agent upgrade --target <selector> [--agent-dir <path>]
FlagDescription
--targetRequired. Host or group selector for the upgrade target.
--agent-dirDirectory containing agent binaries to upload.
nefia agent upgrade --target staging-server

Agent upgrade results: staging-server: v1.2.0 -> v1.3.0

Succeeded: 1, Failed: 0

Installation

Install the operator CLI and nefia-agent on all supported platforms.

VPN Setup Guide

Configure NAT traversal, diagnostics, and key rotation.