Skip to content

Security Overview

Understand Nefia's defense-in-depth security architecture and threat model.

Nefia is designed with a defense-in-depth security model. Every connection traverses multiple independent security layers, so a breach in any single layer does not compromise the system as a whole.

Security Layers

Nefia enforces five distinct security layers for every operation:

1. WireGuard VPN Tunnels

All traffic between the operator and target PCs is encrypted using WireGuard tunnels with Curve25519 key exchange and ChaCha20-Poly1305 authenticated encryption. No ports are exposed on target machines — connections are only reachable through the VPN.

2. Daemon Runtime over mTLS

Inside the WireGuard tunnel, the operator daemon (nefia-operatord) speaks a typed gRPC runtime to each target's nefia-agent. The runtime channel is mutually authenticated with TLS 1.3 client certificates issued by the operator's internal CA, providing a second independent layer of encryption and authentication on top of WireGuard.

The runtime layer is further hardened with:

  • Typed RPC surface: Exec, file, process, and system operations are exposed as discrete gRPC methods rather than a free-form shell channel. Every call carries a host ID, request ID, and operator identity that the daemon validates before dispatch.
  • mTLS with TLS 1.3: Both the operator daemon and the agent present client certificates issued by the operator's internal CA. Clients without a valid certificate are rejected at the TLS handshake, with no application-level fallback.
  • File size enforcement: File-plane reads enforce files.max_file_size_bytes even when the caller does not specify a limit, preventing out-of-memory conditions from unbounded reads.

The legacy nefia ssh interactive command still tunnels OpenSSH through the agent for terminal access, but every other operation (exec, file transfer, sessions, facts collection, MCP tools) runs over the typed gRPC runtime.

3. Policy Engine

A regex-based policy engine enforces command and file path guardrails. Operators define allow/deny rules that apply equally to human CLI users and AI agents connected via the MCP server.

4. Audit Logging

Every command execution, file operation, and session event is recorded in an append-only JSONL log. Entries are chained using SHA-256 hashes and protected with HMAC-SHA256 signatures. File handles are reused on a per-day basis, and hash chain integrity is automatically preserved through rollback on write/sync failure.

5. Authentication

Operators authenticate via OAuth with tokens stored in the OS keyring (macOS Keychain, Linux Secret Service, Windows Credential Manager). VPN enrollment uses HMAC-SHA256 signed tokens with single-use nonces to prevent replay attacks.

Threat Model

Nefia is designed to protect against the following threats:

ThreatMitigation
Network eavesdroppingWireGuard encryption + mTLS inner runtime layer
Unauthorized access to targetsVPN enrollment + mTLS client certificate validation
Lateral movement between targetsStar topology — no target-to-target traffic
Destructive commandsPolicy engine deny rules (anchor enforcement in enforce/warn modes)
AI agent misuseMCP policy enforcement with same guardrails
Audit log tamperingSHA-256 hash chain + HMAC-SHA256 signatures (key stored outside audit logs)
Credential theftOS keyring storage + AES-256-GCM fallback
Enrollment token replaySingle-use nonces with HMAC-SHA256 signatures
Agent token theftOS keyring storage + AES-256-GCM fallback + 90-day TTL auto-rotation
Ping reflection attackNonce binding with constant-time comparison
Relay abuse / DDoSDERP per-IP rate limiting (5 req/s, burst 10) + ACL key allowlist
Cloud API downgradeHTTPS enforcement — http:// URLs rejected at config load
TURN protocol downgradeTLS 1.3 minimum enforced on all TURN connections
Agent resource exhaustionSemaphore-based 100 concurrent connection limit
Hairpin NAT IP leakDiscoverLocalEndpoint validates against public endpoint
Per-IP enrollment abusePer-IP rate limiting on enrollment endpoints
Cloud relay data leakageNo tokens stored server-side; nonce as session secret; unified error responses
ReDoS via crafted inputPolicy engine input length limit (8,192 bytes, unconditional deny)
Path traversalSession sandboxing with symlink resolution
Unauthorized team accessTeam role enforcement (Owner/Admin/Member RBAC) with membership verification
VPN address collision (concurrent enrollment)Exclusive file locking with non-blocking retry + stale PID detection (Windows)

Star Topology

Nefia uses a hub-and-spoke VPN topology where the operator PC is the hub and each target PC is a spoke:

plaintext
Target A ──── Operator (Hub) ──── Target B

              Target C

Targets cannot communicate with each other directly. All traffic must pass through the operator, which means a compromised target cannot pivot to attack other targets on the VPN.

Userspace WireGuard

Nefia runs WireGuard entirely in userspace via gVisor netstack. This means:

  • No root or admin privileges required — the VPN runs as a regular user process
  • Reduced kernel attack surface — no kernel modules to load or maintain
  • Portable across platforms — same implementation on macOS, Linux, and Windows

VPN Enrollment Security

Target PCs are enrolled using cryptographically signed invite tokens:

1

The operator generates an invite token with nefia vpn invite. The token is signed with HMAC-SHA256 and includes a single-use nonce.

2

The target agent receives the token and validates the HMAC signature against the operator's shared secret.

3

A WireGuard key exchange is performed over a temporary channel (direct TCP or cloud relay). The nonce is consumed and cannot be reused.

4

The VPN tunnel is established and the agent's mTLS client certificate is issued so the daemon runtime channel can be opened.

Cloud Relay Security

When agents enroll via the cloud relay (nefia.ai), the following security measures apply:

  • No invite tokens stored server-side — The cloud relay stores only the operator's WireGuard public key, VPN address, and endpoint metadata. The full HMAC-signed invite token is never sent to or stored on the server.
  • Nonce as session secret — The 128-bit cryptographically random nonce serves as the cloud relay session identifier. Only the operator and the agent (via the invite token) know this value.
  • Unified error responses — Unauthenticated endpoints return identical 404 responses for missing, expired, and consumed sessions, preventing nonce state enumeration.
  • Atomic state transitions — The completed → consumed transition uses atomic database operations to prevent two concurrent polls from both reading the enrollment data.
  • Audit logging — Both session creation (operator-side) and agent completion are recorded as audit events with IP addresses.
  • Dead-end prevention — If no direct endpoint is available and the cloud relay session creation fails, the invite is blocked entirely rather than creating an unusable token.

DERP Relay Security

When direct peer-to-peer connections are not possible (symmetric NAT, CGNAT), traffic is relayed through a DERP (Designated Encrypted Relay for Packets) server. The DERP server enforces multiple security layers:

  • Per-IP rate limiting: Each client IP is limited to 5 requests per second with a burst of 10. Exceeding this limit returns HTTP 429 responses, preventing abuse and resource exhaustion.
  • Access control list (ACL): The --allowed-keys-file flag restricts relay access to a file of authorized WireGuard public keys. Only listed keys can establish relay sessions.
  • TLS transport: DERP connections use WebSocket, with TLS termination expected at the load balancer level. The wss:// scheme is recommended for production deployments. The ws:// scheme is permitted for local development and testing. The server enforces a ReadHeaderTimeout of 10 seconds to prevent slowloris attacks.
  • Authenticated health endpoint: The /healthz endpoint exposes only basic liveness information by default. Full metrics (connected clients, relay statistics) require a Bearer token configured via the --metrics-token flag.
  • Graceful shutdown: On termination, the server sends StatusGoingAway to all connected clients, allowing them to failover to an alternative relay without connection errors.

Agent Token Security

In addition to WireGuard keys, each enrolled agent receives a unique API token for cloud communication (endpoint reporting, token rotation):

  • Generation: A 256-bit token is generated using crypto.randomBytes(32) during enrollment. Only the SHA-256 hash is stored server-side as an ApiToken record — the plaintext token never persists on the server.
  • Storage: The agent stores its token in the OS keyring (macOS Keychain, Linux Secret Service, Windows Credential Manager). When no keyring is available, the token is encrypted with AES-256-GCM and written to a fallback file.
  • Automatic rotation: Tokens have a 90-day TTL. When the remaining TTL drops below 7 days, the agent initiates rotation automatically on a 24-hour check cycle via POST /api/agent-tokens/rotate. The old token is revoked atomically when the new token is issued. Rotation failures trigger exponential backoff (5-minute base, 1-hour cap) to avoid overwhelming the server.
  • Ping nonce binding: VPN ping responses are verified using subtle.ConstantTimeCompare to prevent reflection attacks where an attacker replays a legitimate ping response.

HTTPS Enforcement

All communication with the Nefia cloud API requires HTTPS. The validateCloudAPIBase() function rejects any http:// URL at configuration load time, ensuring credentials and tokens are never transmitted over unencrypted connections.

HTTP Response and Payload Size Limits

All HTTP responses and payloads are size-limited to prevent memory exhaustion from oversized or malicious data:

ConstantLimitScope
MaxAPIResponseBytes1 MBStandard API responses
MaxEnrollResponseBytes1 MBEnrollment API responses
MaxWebhookPayloadBytes1 MBOutgoing SIEM/webhook payloads
MaxInlineFileBytes10 MBInline file content via MCP

These limits are centralized in internal/limits/ and enforced at every HTTP boundary.

TURN TLS Hardening

When TURN relay servers are used for NAT traversal, Nefia enforces TLS 1.3 as the minimum version. This prevents downgrade attacks and ensures only modern cipher suites are negotiated.

Nonce Audit Logging

Enrollment nonce events are recorded in the audit log with structured fields for forensic analysis:

  • Nonce prefix: The first 8 characters of the nonce (sufficient for correlation without exposing the full value)
  • Source endpoint: The endpoint (IP:port) of the enrolling agent
  • Host ID: The target host identifier

This enables operators to trace enrollment attempts and detect suspicious patterns without logging sensitive nonce material.

Agent Connection Limits

The nefia-agent enforces a semaphore-based limit of 100 concurrent forwarded connections. When this limit is reached, new incoming connections are rejected immediately to prevent resource exhaustion. This protects target PCs from being overwhelmed by misconfigured or malicious operators.

The agent's health watchdog enters a 30-minute cooldown after 5 consecutive rebuild attempts (rebuilds are triggered after 2 consecutive health check failures), preventing excessive resource consumption. After the cooldown, it resumes monitoring and rebuild attempts.

Connection Pool Reliability

The operator daemon maintains a per-host gRPC channel pool for the runtime layer. Channels are reused across sequential operations and recycled when an idle sweep, health check, or shutdown closes them. Dead channels are detected by health probes and replaced before being handed back to callers.

VPN peer dial mutex entries are cleaned up on tunnel rebuild only. Entries are intentionally retained during peer removal because concurrent DialTCPWithFallback calls may still hold a reference to the mutex.

Alerting

Nefia includes a webhook-based alerting system that notifies operators of security-relevant events in real time. Alerts are dispatched asynchronously and do not block the triggering operation.

The alerting system monitors the following event categories:

  • exec_failure — Command execution failures across one or more hosts
  • vpn_peer_unhealthy — VPN peer handshake staleness detected by the health monitor
  • circuit_breaker_open -- Daemon runtime circuit breaker tripped for a host after consecutive failures
  • policy_rebuild_failed — Policy engine failed to rebuild after a config hot-reload
  • enrollment_complete — A host finished enrollment successfully
  • host_online / host_offline — A host changed reachability state
  • key_rotation — WireGuard key rotation completed
  • config_change — Configuration was saved or reloaded with changes
  • playbook_complete / playbook_failed — A playbook run finished successfully or failed
  • queue_executed / queue_failed — An offline queued command succeeded or failed
  • host_revoked / host_revoke_all — One host, or all hosts, had VPN access revoked
  • vpn_path_switch — VPN multipath failover occurred (active path switched between direct and relay)

Alerts support two delivery formats: Slack (Block Kit) and generic JSON. Per-event-type cooldown (default: 5 minutes) prevents alert flooding. Failed webhook deliveries are retried up to 3 times with exponential backoff (1s base, 10s per-attempt timeout).

See the alerts configuration for setup instructions.

Observability

Nefia provides two complementary observability layers:

  • OpenTelemetry tracing: Distributed traces exported via OTLP HTTP. A TraceHandler automatically injects trace_id and span_id into all structured log output, enabling log-to-trace correlation. When the OTLP endpoint is unreachable, a noop tracer is installed silently — startup is never blocked.
  • Prometheus metrics: A dedicated HTTP server exposes /metrics (Prometheus scrape), /healthz (liveness), and /readyz (readiness) endpoints. Metrics cover command execution, connection pool size, filesystem operations, session lifecycle, VPN peer health, and playbook runs, both as counters and as duration histograms.

Both subsystems are disabled by default and add zero overhead when disabled. See the telemetry configuration for setup instructions.

MCP Policy Enforcement

When AI agents connect via the MCP server (nefia mcp serve), they are subject to the same policy engine rules as human operators. There is no separate privilege model — the deny rules, path restrictions, and RBAC roles apply identically.

This ensures that an AI agent cannot bypass guardrails that a human operator would be subject to, regardless of how the agent formulates its requests.

The MCP approval workflow is controlled by mcp.approval. When enabled, nefia mcp serve exposes approval tools and pauses matching tool calls until they are approved, denied, or timed out.

Domain errors from MCP tools are returned as isError: true content per the MCP specification, allowing AI agents to see error details and adjust their behavior.

Workspace Sandboxing

File operations are confined to their workspace or session root. Path traversal attempts, including those using symlinks that resolve outside the sandbox, are rejected at the daemon file plane before any I/O occurs. Policy-denied paths are also rejected at the daemon layer before touching the target. See Workspaces and Sessions for details.

Process Runtime Security

The agent-native process runtime listens on port 7222, bound exclusively to the VPN-local address. It is never exposed on LAN or public interfaces.

  • Authentication — Each connection performs an HMAC-SHA256 nonce handshake derived from the existing VPN trust relationship. Unauthenticated connections are rejected before any process metadata is revealed.
  • Policy enforcementproc.spawn, proc.stdin, proc.signal, and proc.delete are subject to the operator-side policy engine. Role-based rules can selectively allow or deny specific process operations.
  • Audit logging — All process operations (proc_spawn, proc_attach, proc_stdin, proc_signal, proc_wait, proc_exit) are recorded in the append-only audit log with command, args, cwd, process ID, PID, and exit code.
  • Approval workflow — When MCP approval is enabled, destructive process operations require human approval before execution.

Code Signing

Release binaries are signed at multiple levels to ensure integrity and trust:

PlatformSigning MethodVerification
macOSApple Developer ID + NotarizationGatekeeper (automatic)
WindowsAuthenticode (SHA-256)SmartScreen / Windows Defender
AllEd25519 detached signature (.sig)Verified by nefia-agent on startup and auto-update

macOS signing and notarization ensures Gatekeeper allows the binary to run without warnings. Windows Authenticode signing ensures SmartScreen does not flag the binary as untrusted. The Ed25519 signatures provide a platform-independent integrity check that does not depend on any certificate authority.

Policy Engine

Define allow/deny rules for commands and file paths.

Audit Logging

Track every operation with tamper-evident logs.

Authentication

OAuth login, token management, and VPN enrollment.