Agent

Reference for the nefia-agent binary, configuration, and lifecycle management.

The nefia-agent binary runs on target PCs and establishes a WireGuard VPN tunnel back to the operator. Once enrolled, it maintains a persistent connection, forwards SSH traffic, and monitors its own health automatically.

Supported Platforms

OS	Architecture	Service Manager
macOS 15+	amd64, arm64	launchd
Linux (Ubuntu 22.04+, Debian 12+, Fedora 38+)	amd64, arm64	systemd
Windows 11+	amd64, arm64	Scheduled task via schtasks

Installation

bash

curl -fsSL https://www.nefia.ai/install-agent.sh | sh -s -- --token '<INVITE_TOKEN>'

Downloads the latest binary, verifies its SHA-256 checksum, installs to /usr/local/bin/nefia-agent, enrolls the agent, and registers a system service.

Agent Binary Commands

The nefia-agent binary provides the following subcommands:

Command	Description
`nefia-agent enroll --token <TOKEN>`	Enroll with an operator hub using an invite token
`nefia-agent run --config <PATH>`	Run the agent in the foreground (for debugging or service invocation)
`nefia-agent install-service --config <PATH>`	Install the agent as a system service
`nefia-agent uninstall-service`	Remove the system service
`nefia-agent status`	Show agent health status from the last health report
`nefia-agent wol --mac <MAC>`	Send a Wake-on-LAN magic packet from the local network
`nefia-agent --version`	Print version, commit, and build date, then exit

Global Flags

Flag	Default	Description
`--config`	--	Path to agent config file
`--log-level`	`info`	Log level: debug, info, warn, error

Enrollment

Enrollment pairs the agent with an operator using a single-use, HMAC-SHA256 signed invite token. For cloud relay enrollment, agent tokens are issued via a 2-phase bootstrap code exchange to prevent token replay if the enrollment response is intercepted.

Enrollment Flow

Generate an invite token on the operator PC:

bash

nefia vpn invite --name staging-server --os macos --stun

When logged in (nefia login), this also creates a cloud enrollment session with an expected_host_id parameter that binds the session to a specific agent.

Run the installer with the token on the target PC:

bash

curl -fsSL https://www.nefia.ai/install-agent.sh | sh -s -- --token '<INVITE_TOKEN>'

nefia-agent enroll --token eyJh... --install --yes

Connecting to operator at 203.0.113.10:19820... Direct enrollment succeeded. Validating invite token... OK Key exchange... OK VPN tunnel established 10.99.0.5 SSH host key registered SHA256:bE4f...kQ9w Enrollment successful! Config written to /Users/admin/.nefia/agent.yaml Service installed.

2-Phase Bootstrap Code Exchange (Cloud Relay)

When enrollment uses the cloud relay, agent tokens are not issued directly. Instead, the operator receives a short-lived bootstrap code and exchanges it for the actual agent token in a separate API call. This happens automatically within the operator CLI.

Phase 1 -- Enrollment completes. The agent submits its WireGuard public key and endpoint to the cloud relay. When the operator polls the enrollment session result (GET /api/enrollment/sessions/[nonce]/result), the server returns a bootstrap code (HS256-signed JWT, 5-minute TTL) instead of the agent token. The bootstrap code embeds the session nonce, team ID, and host ID.

Phase 2 -- Bootstrap exchange. The operator calls POST /api/agent-tokens/bootstrap/exchange with the bootstrap code and nonce. The server verifies the JWT signature, checks that the host_id in the token matches the enrollment session's agentHostID, marks the session as consumed, and returns the actual agent token. This exchange can only succeed once -- replaying the same bootstrap code returns an error.

Security properties:

Short-lived: The bootstrap code expires after 5 minutes, limiting the replay window.
Single-use: The enrollment session transitions from completed to consumed atomically. A second exchange attempt is rejected.
Host-bound: The expected_host_id set during session creation must match the agent that actually completed enrollment, preventing a different agent from claiming the token.
Transparent to users: nefia-agent enroll and the operator CLI handle both phases automatically. The user experience is unchanged.

Enroll Flags

Flag	Description
`--token`	Invite token string (mutually exclusive with `--token-file`)
`--token-file`	Path to a file containing the invite token
`--out-dir`	Directory to write the generated agent config
`--ssh-addr`	Local SSH address to forward to (default: `127.0.0.1:22`)
`--force`	Overwrite existing agent config
`--yes`	Skip interactive confirmation
`--install`	Automatically install the system service after enrollment

Cloud Relay Fallback

When the agent cannot reach the operator directly (NAT, CGNAT, firewall), enrollment automatically falls back to the cloud relay:

plaintext

Agent → HTTPS → nefia.ai (relay) ← HTTPS ← Operator

The fallback is transparent -- nefia-agent enroll first tries a direct TCP connection (10 second timeout), and if that fails, uses the cloud relay via nefia.ai. Only WireGuard public keys and metadata are exchanged through the relay; private keys never leave the device.

nefia-agent enroll --token eyJh... (cloud relay)

Connecting to operator at 203.0.113.10:19820... Direct connection failed. Trying cloud relay... Cloud relay enrollment succeeded. VPN tunnel established 10.99.0.5

Enrollment complete.

Enrollment Error Messages

When enrollment fails, the agent receives a categorized error message to help diagnose the issue:

Error	Cause	Resolution
`token has expired`	The invite token's TTL has elapsed	Generate a new token with `nefia vpn reinvite`
`token has already been used (nonce reused)`	The token was already used for enrollment	Generate a fresh token -- each token is single-use
`invalid token signature`	The token was corrupted or tampered with	Copy the full token string again from the operator
`validation failed`	Other validation error	Check the operator logs for details
`enrollment nonce must not be empty`	The token's nonce field is missing or empty	Generate a new token -- this indicates a corrupted or hand-crafted token
`too many enrollment requests`	IP-based rate limiting triggered	Wait 30 seconds and retry
`Invalid bootstrap code`	The bootstrap code JWT is expired, tampered with, or the host ID does not match	Re-run enrollment -- the bootstrap code has a 5-minute TTL
`Bootstrap code has already been exchanged`	The bootstrap code was already used to obtain the agent token	Generate a new invite token and re-enroll
`Enrollment session is missing agent_host_id`	The agent did not report its host ID during enrollment	Check agent logs for enrollment errors and retry

Enrollment Response Size Limit

Enrollment API responses are limited to 1 MB (MaxEnrollResponseBytes). Responses exceeding this limit are rejected to prevent memory exhaustion from malicious or malformed relay responses.

Rate Limiter Behavior

The per-IP rate limiter uses deterministic eviction when the limiter table is full. It evicts the oldest entry by lastSeen timestamp, preferring entries that still have available tokens over those that are currently rate-limited. This ensures that active, well-behaved clients are not evicted in favor of rate-limited ones.

Configuration

The agent stores its YAML configuration at:

All platforms: ~/.nefia/agent.yaml (i.e. $HOME/.nefia/agent.yaml on macOS/Linux, %USERPROFILE%\.nefia\agent.yaml on Windows)

Field	Type	Default	Description
`private_key`	string	--	WireGuard private key (base64-encoded, or `keyring:` for OS keyring storage)
`listen_port`	int	`51820`	UDP listen port for WireGuard
`address`	string	--	VPN IP with CIDR notation (e.g. `10.99.0.2/24`)
`dns`	string[]	--	DNS servers reachable inside VPN (e.g. `["10.99.0.1"]`)
`ssh_addr`	string	`127.0.0.1:22`	Local SSH address to forward tunnel connections to
`peers`	PeerEntry[]	--	Array of operator peer entries (see below)
`update_server_url`	string	--	Remote update server URL (empty = no remote updates)
`agent_token`	string	--	API token for cloud communication (from enrollment). Supports `keyring:agent-token` prefix for OS keyring storage.
`agent_token_expires_at`	string	--	RFC 3339 expiry time for the agent token (e.g., `2026-06-01T00:00:00Z`).
`host_id`	string	--	Host ID from enrollment, used for endpoint reporting.
`stun_servers`	string[]	--	Custom STUN servers for endpoint discovery (overrides defaults). Format: `host:port`.
`endpoint_refresh`	duration	--	STUN refresh interval (e.g., `5m`). Empty or `0` disables periodic endpoint refresh.
`cloud_api_base`	string	--	Cloud API base URL for endpoint reporting and token rotation. Must be `https://`.
`derp_servers`	DERPServer[]	--	DERP relay servers for relay-first connectivity. Propagated from operator config during enrollment.
`derp_auto_select`	boolean	`true`	Enable RTT-based automatic DERP server selection when multiple relays are configured.
`derp_probe_interval`	duration	`5m`	DERP RTT probe interval. Must be at least `30s`.
`beacons`	object	--	Agent-side event monitoring configuration. See below.

Beacon Configuration

The beacons section configures agent-side event monitoring. Events are sent to the operator's reactor via TCP.

yaml

beacons:
  disk_usage:
    enabled: true
    interval: "5m"
    warning_percent: 80
    critical_percent: 90
    paths:
      - "/"
      - "/home"
  service_check:
    enabled: true
    interval: "30s"
    services:
      - nginx
      - postgresql
  scripts:
    - name: "custom-check"
      command: "/opt/scripts/check.sh"
      interval: "10m"

Beacon	Fields	Description
`disk_usage`	`enabled`, `interval`, `warning_percent`, `critical_percent`, `paths`	Monitor disk usage on specified mount points.
`service_check`	`enabled`, `interval`, `services`	Monitor systemd service status (Linux only).
`scripts`	`name`, `command`, `interval`	Run custom scripts. Exit 0 = OK, 1 = warning, 2+ = critical.

Peer Entry Fields

Each entry in the peers array configures a WireGuard peer (operator):

Field	Description
`public_key`	Base64-encoded WireGuard public key of the operator
`endpoint`	UDP endpoint (`ip:port`); empty for incoming-only peers
`vpn_addr`	Peer's VPN IP address (e.g. `10.99.0.1`)

Config Validation

On startup, the agent runs validateAgentConfig() to check all required fields:

private_key: Must be present (plaintext base64 or keyring: reference).
address: Must be a valid CIDR notation (e.g., 10.99.0.2/24).
peers: At least one peer must be configured with a valid public key and VPN address.
listen_port: Must be a valid port number (0--65535).
derp_probe_interval: When set, must parse as a Go duration and be at least 30 seconds.

Each validation failure produces a clear error message. The agent exits immediately on validation failure.

Agent Token

Each enrolled agent receives a unique API token for cloud communication (endpoint reporting, token rotation). This token is separate from the operator's OAuth tokens and the WireGuard key pair.

Token Generation

Agent tokens are 256-bit values generated with crypto.randomBytes(32). Only the SHA-256 hash is stored server-side as an ApiToken record -- the plaintext token never persists on the server.

For direct enrollment, the operator generates the token locally and delivers it to the agent over the authenticated WireGuard tunnel.

For cloud relay enrollment, the token is issued via a 2-phase bootstrap code exchange (see 2-Phase Bootstrap Code Exchange). The server issues a short-lived bootstrap code (JWT, 5-minute TTL) when enrollment completes. The operator exchanges this bootstrap code for the actual agent token via POST /api/agent-tokens/bootstrap/exchange. The token is then forwarded to the agent over the VPN tunnel.

Token Storage

The agent stores its token in the OS keyring:

Platform	Keyring Backend
macOS	Keychain (`nefia-agent-token`)
Linux	Secret Service API (GNOME Keyring, KWallet)
Windows	Credential Manager (`nefia-agent-token`)

When no keyring is available (headless servers, containers), the token is encrypted with AES-256-GCM and saved to a fallback file alongside the agent config.

Automatic Rotation

Agent tokens have a 90-day TTL. The agent checks the remaining TTL every 24 hours. When the TTL drops below 7 days, the agent initiates rotation:

Calls POST /api/agent-tokens/rotate with its current token.
The server verifies the current token, generates a new token, and revokes the old one atomically.
The agent stores the new token in the keyring, replacing the previous value.

If rotation fails (network error, server unavailable), the agent retries on the next 24-hour cycle. The existing token remains valid until its TTL expires.

Endpoint Reporting

The agent periodically reports its public endpoint to the cloud API so operators can reach it even when IP addresses change:

API: POST /api/hosts/[hostId]/endpoint (rate limited to 10 requests/minute)
Trigger: Endpoint is reported after STUN discovery during startup and whenever the network monitor detects an IP change.
Fields: The agent sends its current public endpoint, local endpoint (for hairpin NAT), and NAT type classification.

The host_id field in the agent config (set during enrollment) identifies which host record to update.

Log Rotation

The agent supports built-in log rotation via the following flags on nefia-agent run:

Flag	Type	Default	Description
`--log-file`	string	platform default	Path to the log file
`--log-max-size`	int	`50`	Maximum log file size in MB before rotation
`--log-max-backups`	int	`5`	Number of rotated log files to retain
`--log-max-age`	int	`30`	Maximum days to retain rotated log files

Default log file locations:

Platform	Path
macOS	`~/Library/Logs/nefia/agent.log`
Linux	`$XDG_STATE_HOME/nefia/agent.log` (default: `~/.local/state/nefia/agent.log`)
Windows	`%LOCALAPPDATA%\nefia\logs\agent.log`

When the log file exceeds --log-max-size, it is rotated and compressed. Old files are pruned based on --log-max-backups and --log-max-age.

Run Command

Start the agent in the foreground:

bash

nefia-agent run --config <PATH> [flags]

Flag	Description
`--strict-permissions`	Refuse to start if config file has world-readable permissions (mode & 0o077 != 0) and contains a plaintext private key. Default: `true`. Use this in production to enforce secure key storage.
`--auto-update-interval`	Interval for checking binary changes and auto-restarting (0 = disabled, e.g. `5m`).
`--log-file`	Path to the log file (see Log Rotation).
`--log-max-size`	Maximum log file size in MB before rotation (default: 50).
`--log-max-backups`	Number of rotated log files to retain (default: 5).
`--log-max-age`	Maximum days to retain rotated log files (default: 30).

When registered as a system service, the service manager invokes nefia-agent run automatically.

On Windows, the agent runs an icacls check at startup and warns if Everyone or BUILTIN\Users has read access to the config file. On Unix, it checks file permissions (mode & 0o077) and warns if the config is readable by other users.

Service Management

Installing the Service

bash

nefia-agent install-service --config <PATH>

Uninstalling the Service

Remove the system service:

bash

nefia-agent uninstall-service [--yes]

The --yes flag skips the interactive confirmation prompt.

Platform Details

Registers a LaunchAgent plist at ~/Library/LaunchAgents/com.nefia.agent.plist. The agent runs as a user-level service (no root required). Logs are written to ~/Library/Logs/nefia/nefia-agent.{stdout,stderr}.log.

bash

launchctl print gui/$(id -u)/com.nefia.agent

Resource Limits

Constant	Value	Description
`maxAgentConns`	100	Maximum concurrent forwarded connections. New connections are rejected when the limit is reached.
`drainTimeout`	30s	Maximum time to wait for active connections to finish during graceful shutdown.
`defaultMTU`	1420	MTU used for the WireGuard netstack TUN device.

SSH Host Key Verification (TOFU)

The agent's SSH host key is registered automatically during enrollment. The operator uses Trust On First Use (TOFU) semantics:

First connection: The host key is accepted and persisted to the operator's known_hosts file.
Subsequent connections: The key must match. A changed key causes connection rejection with a MITM warning.
Concurrent access: A RWMutex protects the known_hosts file, allowing concurrent reads while serializing writes.

This is safe because VPN connections are already authenticated via WireGuard key pairs.

Hairpin NAT Fallback

During enrollment, the agent calls DiscoverLocalEndpoint() to detect its LAN IP address. This local endpoint is reported to the operator as a fallback for hairpin NAT scenarios where the public endpoint is unreachable from the same LAN.

The agent validates the discovered local IP against the public endpoint:

If no suitable local IP is found, the field is left empty.
If the local IP matches the public endpoint IP, the field is left empty (no fallback needed).
Otherwise, the LAN address is stored in host.vpn.local_endpoint in the operator's config.

When establishing a VPN connection, the operator tries the public endpoint first. If unreachable, it falls back to the local endpoint for same-LAN connectivity.

Lifecycle

Service start -- the system service manager launches nefia-agent run.

VPN connect -- establishes a WireGuard tunnel using the stored keypair and endpoint.

SSH forward -- begins forwarding VPN connections to the local SSH server (capped at 100 concurrent connections). The tunnel forwarding logic detects short writes (io.ErrShortWrite) during data relay, preventing silent data corruption when the destination connection cannot accept a full write.

Health monitor loop -- continuously checks tunnel health. On failure, rebuilds with exponential backoff (5s initial, 5m cap, 2x multiplier).

Health Monitoring and Auto-Update

The agent runs two background loops after connecting:

Health checks (watchdog) run every 60 seconds when healthy, and every 15 seconds when unhealthy (adaptive interval). If the VPN tunnel degrades, it is torn down and rebuilt with exponential backoff (5s initial, 5m cap, 2x multiplier). If all peers are stale for 2 consecutive monitoring cycles, a full tunnel rebuild is triggered. After 2 consecutive rebuild failures, the watchdog enters a 30-minute cooldown period before attempting again (it no longer gives up permanently). If peer status check fails (IPC error), all peers are treated as stale to trigger rebuild investigation. Results are reported to the operator via nefia vpn status, which shows a STALE indicator for peers with no recent handshake.

Network Monitor Integration

The agent runs a network monitor that watches for IP address changes on local interfaces (checked every 5 seconds). When a network change is detected:

Added and removed IP addresses are logged
The watchdog receives an immediate rebuild signal
The tunnel is rebuilt without waiting for the next health check cycle

This provides near-instant recovery after Wi-Fi reconnection, network switches, or VPN toggles on the host system.

Auto-updates are disabled by default (--auto-update-interval 0). To enable them, pass a non-zero duration such as --auto-update-interval 6h to nefia-agent run. When enabled, the agent periodically checks for new releases, downloads the binary, verifies its Ed25519 signature, replaces it atomically, and restarts the service.

Live Config Reload (Unix)

On Unix systems (macOS and Linux), sending SIGHUP to the agent process triggers a config reload:

bash

kill -HUP $(pgrep nefia-agent)

Useful for applying key rotation or endpoint changes without service downtime. If the reload fails (e.g., invalid YAML or missing fields), the error is recorded in the health report's last_config_error field, making it easy to diagnose issues remotely via nefia-agent status or by reading agent-health.json.

nefia-agent status

Reads agent-health.json from the config directory and displays the current health report.

nefia-agent status

Health Status: ok VPN Connected: true Uptime: 72h22m0s Version: v1.3.0 Rebuild Count: 2 Last Rebuild: 2026-02-28T08:12:00Z Goroutines: 18

If a config reload error has occurred, a Config Error line is shown as well.

Health Report Fields

The agent-health.json file contains the following fields:

Field	Description
`status`	Health status string (`ok`, `degraded`, `unhealthy`).
`vpn_connected`	Whether the VPN tunnel is currently connected.
`uptime_seconds`	Seconds since the agent process started.
`version`	Running agent version string.
`rebuild_count`	Total number of tunnel rebuilds since startup.
`last_rebuild`	Timestamp of the most recent tunnel rebuild.
`goroutines`	Current number of Go goroutines (useful for leak detection).
`last_config_error`	Last SIGHUP config reload error message. Empty when the last reload succeeded.

Remote Management (Operator CLI)

The operator CLI provides two subcommands for managing agents on remote hosts:

nefia agent version

Show the agent version on remote hosts:

bash

nefia agent version [--target <selector>]

Defaults to --target all if no target is specified.

nefia agent version --target staging-server

HOST VERSION COMPATIBLE MESSAGE staging-server v1.3.0 yes up to date

nefia agent upgrade

Upgrade nefia-agent on remote hosts:

bash

nefia agent upgrade --target <selector> [--agent-dir <path>]

Flag	Description
`--target`	Required. Host or group selector for the upgrade target.
`--agent-dir`	Directory containing agent binaries to upload.

nefia agent upgrade --target staging-server

Agent upgrade results: staging-server: v1.2.0 -> v1.3.0

Succeeded: 1, Failed: 0

Installation

Install the operator CLI and nefia-agent on all supported platforms.

VPN Setup Guide

Configure NAT traversal, diagnostics, and key rotation.