Security Overview

Understand Nefia's defense-in-depth security architecture and threat model.

Nefia is designed with a defense-in-depth security model. Every connection traverses multiple independent security layers, so a breach in any single layer does not compromise the system as a whole.

Security Layers

Nefia enforces five distinct security layers for every operation:

All traffic between the operator and target PCs is encrypted using WireGuard tunnels with Curve25519 key exchange and ChaCha20-Poly1305 authenticated encryption. No ports are exposed on target machines — connections are only reachable through the VPN.

2. SSH Inside VPN

SSH runs inside the WireGuard tunnel, providing a second layer of encryption. To reduce double-encryption overhead, Nefia negotiates lightweight ciphers for the inner SSH session while maintaining full authentication and integrity guarantees.

The SSH layer is further hardened with:

Hardened key exchange: Only modern algorithms are offered — curve25519-sha256, curve25519-sha256@libssh.org, and ecdh-sha2-nistp256. Legacy Diffie-Hellman groups are excluded.
Hardened MACs: Encrypt-then-MAC variants are preferred (hmac-sha2-256-etm@openssh.com), with hmac-sha2-256 as a fallback.
Trust On First Use (TOFU): Unknown hosts connecting over the VPN are automatically accepted and their keys persisted to known_hosts. If a host key changes, the connection is rejected to prevent man-in-the-middle attacks. Key change events are logged at WARN level for security monitoring. Concurrent access is protected by a RWMutex.
File size enforcement: SFTP reads enforce ssh.max_file_size_bytes even when the caller does not specify a limit, preventing out-of-memory conditions from unbounded reads.

3. Policy Engine

A regex-based policy engine enforces command and file path guardrails. Operators define allow/deny rules that apply equally to human CLI users and AI agents connected via the MCP server.

4. Audit Logging

Every command execution, file operation, and session event is recorded in an append-only JSONL log. Entries are chained using SHA-256 hashes and protected with HMAC-SHA256 signatures. File handles are reused on a per-day basis, and hash chain integrity is automatically preserved through rollback on write/sync failure.

5. Authentication

Operators authenticate via OAuth with tokens stored in the OS keyring (macOS Keychain, Linux Secret Service, Windows Credential Manager). VPN enrollment uses HMAC-SHA256 signed tokens with single-use nonces to prevent replay attacks.

Threat Model

Nefia is designed to protect against the following threats:

Threat	Mitigation
Network eavesdropping	WireGuard encryption + SSH inner layer
Unauthorized access to targets	VPN enrollment + SSH public key auth
Lateral movement between targets	Star topology — no target-to-target traffic
Destructive commands	Policy engine deny rules (anchor enforcement in enforce/warn modes)
AI agent misuse	MCP policy enforcement with same guardrails
Audit log tampering	SHA-256 hash chain + HMAC-SHA256 signatures (key stored outside audit logs)
Credential theft	OS keyring storage + AES-256-GCM fallback
Enrollment token replay	Single-use nonces with HMAC-SHA256 signatures
Agent token theft	OS keyring storage + AES-256-GCM fallback + 90-day TTL auto-rotation
Ping reflection attack	Nonce binding with constant-time comparison
Relay abuse / DDoS	DERP per-IP rate limiting (5 req/s, burst 10) + ACL key allowlist
Cloud API downgrade	HTTPS enforcement — http:// URLs rejected at config load
TURN protocol downgrade	TLS 1.3 minimum enforced on all TURN connections
Agent resource exhaustion	Semaphore-based 100 concurrent connection limit
Hairpin NAT IP leak	DiscoverLocalEndpoint validates against public endpoint
Per-IP enrollment abuse	Per-IP rate limiting on enrollment endpoints
Cloud relay data leakage	No tokens stored server-side; nonce as session secret; unified error responses
ReDoS via crafted input	Policy engine input length limit (8,192 bytes, unconditional deny)
Path traversal	Session sandboxing with symlink resolution
Unauthorized team access	Team role enforcement (Owner/Admin/Member RBAC) with membership verification
VPN address collision (concurrent enrollment)	Exclusive file locking with non-blocking retry + stale PID detection (Windows)

Star Topology

Nefia uses a hub-and-spoke VPN topology where the operator PC is the hub and each target PC is a spoke:

plaintext

Target A ──── Operator (Hub) ──── Target B
                   │
              Target C

Targets cannot communicate with each other directly. All traffic must pass through the operator, which means a compromised target cannot pivot to attack other targets on the VPN.

Userspace WireGuard

Nefia runs WireGuard entirely in userspace via gVisor netstack. This means:

No root or admin privileges required — the VPN runs as a regular user process
Reduced kernel attack surface — no kernel modules to load or maintain
Portable across platforms — same implementation on macOS, Linux, and Windows

VPN Enrollment Security

Target PCs are enrolled using cryptographically signed invite tokens:

The operator generates an invite token with nefia vpn invite. The token is signed with HMAC-SHA256 and includes a single-use nonce.

The target agent receives the token and validates the HMAC signature against the operator's shared secret.

A WireGuard key exchange is performed over a temporary channel (direct TCP or cloud relay). The nonce is consumed and cannot be reused.

The VPN tunnel is established and the target's SSH host key is registered automatically.

Cloud Relay Security

When agents enroll via the cloud relay (nefia.ai), the following security measures apply:

No invite tokens stored server-side — The cloud relay stores only the operator's WireGuard public key, VPN address, and endpoint metadata. The full HMAC-signed invite token is never sent to or stored on the server.
Nonce as session secret — The 128-bit cryptographically random nonce serves as the cloud relay session identifier. Only the operator and the agent (via the invite token) know this value.
Unified error responses — Unauthenticated endpoints return identical 404 responses for missing, expired, and consumed sessions, preventing nonce state enumeration.
Atomic state transitions — The completed → consumed transition uses atomic database operations to prevent two concurrent polls from both reading the enrollment data.
Audit logging — Both session creation (operator-side) and agent completion are recorded as audit events with IP addresses.
Dead-end prevention — If no direct endpoint is available and the cloud relay session creation fails, the invite is blocked entirely rather than creating an unusable token.

DERP Relay Security

When direct peer-to-peer connections are not possible (symmetric NAT, CGNAT), traffic is relayed through a DERP (Designated Encrypted Relay for Packets) server. The DERP server enforces multiple security layers:

Per-IP rate limiting: Each client IP is limited to 5 requests per second with a burst of 10. Exceeding this limit returns HTTP 429 responses, preventing abuse and resource exhaustion.
Access control list (ACL): The --allowed-keys-file flag restricts relay access to a file of authorized WireGuard public keys. Only listed keys can establish relay sessions.
TLS transport: DERP connections use WebSocket, with TLS termination expected at the load balancer level. The wss:// scheme is recommended for production deployments. The ws:// scheme is permitted for local development and testing. The server enforces a ReadHeaderTimeout of 10 seconds to prevent slowloris attacks.
Authenticated health endpoint: The /healthz endpoint exposes only basic liveness information by default. Full metrics (connected clients, relay statistics) require a Bearer token configured via the --metrics-token flag.
Graceful shutdown: On termination, the server sends StatusGoingAway to all connected clients, allowing them to failover to an alternative relay without connection errors.

Agent Token Security

In addition to WireGuard keys, each enrolled agent receives a unique API token for cloud communication (endpoint reporting, token rotation):

Generation: A 256-bit token is generated using crypto.randomBytes(32) during enrollment. Only the SHA-256 hash is stored server-side as an ApiToken record — the plaintext token never persists on the server.
Storage: The agent stores its token in the OS keyring (macOS Keychain, Linux Secret Service, Windows Credential Manager). When no keyring is available, the token is encrypted with AES-256-GCM and written to a fallback file.
Automatic rotation: Tokens have a 90-day TTL. When the remaining TTL drops below 7 days, the agent initiates rotation automatically on a 24-hour check cycle via POST /api/agent-tokens/rotate. The old token is revoked atomically when the new token is issued. Rotation failures trigger exponential backoff (5-minute base, 1-hour cap) to avoid overwhelming the server.
Ping nonce binding: VPN ping responses are verified using subtle.ConstantTimeCompare to prevent reflection attacks where an attacker replays a legitimate ping response.

HTTPS Enforcement

All communication with the Nefia cloud API requires HTTPS. The validateCloudAPIBase() function rejects any http:// URL at configuration load time, ensuring credentials and tokens are never transmitted over unencrypted connections.

HTTP Response and Payload Size Limits

All HTTP responses and payloads are size-limited to prevent memory exhaustion from oversized or malicious data:

Constant	Limit	Scope
`MaxAPIResponseBytes`	1 MB	Standard API responses
`MaxEnrollResponseBytes`	1 MB	Enrollment API responses
`MaxWebhookPayloadBytes`	1 MB	Outgoing SIEM/webhook payloads
`MaxInlineFileBytes`	10 MB	Inline file content via MCP

These limits are centralized in internal/limits/ and enforced at every HTTP boundary.

TURN TLS Hardening

When TURN relay servers are used for NAT traversal, Nefia enforces TLS 1.3 as the minimum version. This prevents downgrade attacks and ensures only modern cipher suites are negotiated.

Nonce Audit Logging

Enrollment nonce events are recorded in the audit log with structured fields for forensic analysis:

Nonce prefix: The first 8 characters of the nonce (sufficient for correlation without exposing the full value)
Source endpoint: The endpoint (IP:port) of the enrolling agent
Host ID: The target host identifier

This enables operators to trace enrollment attempts and detect suspicious patterns without logging sensitive nonce material.

Agent Connection Limits

The nefia-agent enforces a semaphore-based limit of 100 concurrent forwarded connections. When this limit is reached, new incoming connections are rejected immediately to prevent resource exhaustion. This protects target PCs from being overwhelmed by misconfigured or malicious operators.

The agent's health watchdog enters a 30-minute cooldown after 5 consecutive rebuild attempts (rebuilds are triggered after 2 consecutive health check failures), preventing excessive resource consumption. After the cooldown, it resumes monitoring and rebuild attempts.

Connection Pool Reliability

SSH connection pool entries use sync.Once for close operations, preventing double-close issues from concurrent access. Health checks ensure dead connection pool entries are always closed, preventing resource leaks.

VPN peer dial mutex entries are cleaned up on tunnel rebuild only. Entries are intentionally retained during peer removal because concurrent DialTCPWithFallback calls may still hold a reference to the mutex.

Alerting

Nefia includes a webhook-based alerting system that notifies operators of security-relevant events in real time. Alerts are dispatched asynchronously and do not block the triggering operation.

The alerting system monitors the following event categories:

exec_failure — Command execution failures across one or more hosts
vpn_peer_unhealthy — VPN peer handshake staleness detected by the health monitor
circuit_breaker_open — SSH circuit breaker tripped for a host after consecutive failures
policy_rebuild_failed — Policy engine failed to rebuild after a config hot-reload
enrollment_complete — A host finished enrollment successfully
host_online / host_offline — A host changed reachability state
key_rotation — WireGuard key rotation completed
config_change — Configuration was saved or reloaded with changes
playbook_complete / playbook_failed — A playbook run finished successfully or failed
queue_executed / queue_failed — An offline queued command succeeded or failed
host_revoked / host_revoke_all — One host, or all hosts, had VPN access revoked
vpn_path_switch — VPN multipath failover occurred (active path switched between direct and relay)

Alerts support two delivery formats: Slack (Block Kit) and generic JSON. Per-event-type cooldown (default: 5 minutes) prevents alert flooding. Failed webhook deliveries are retried up to 3 times with exponential backoff (1s base, 10s per-attempt timeout).

See the alerts configuration for setup instructions.

Observability

Nefia provides two complementary observability layers:

OpenTelemetry tracing: Distributed traces exported via OTLP HTTP. A TraceHandler automatically injects trace_id and span_id into all structured log output, enabling log-to-trace correlation. When the OTLP endpoint is unreachable, a noop tracer is installed silently — startup is never blocked.
Prometheus metrics: A dedicated HTTP server exposes /metrics (Prometheus scrape), /healthz (liveness), and /readyz (readiness) endpoints. Metrics cover command execution, SSH dial attempts, connection pool size, filesystem operations, session lifecycle, VPN peer health, and playbook runs — both as counters and as duration histograms.

Both subsystems are disabled by default and add zero overhead when disabled. See the telemetry configuration for setup instructions.

MCP Policy Enforcement

When AI agents connect via the MCP server (nefia mcp serve), they are subject to the same policy engine rules as human operators. There is no separate privilege model — the deny rules, path restrictions, and RBAC roles apply identically.

This ensures that an AI agent cannot bypass guardrails that a human operator would be subject to, regardless of how the agent formulates its requests.

The MCP approval workflow is controlled by mcp.approval. When enabled, nefia mcp serve exposes approval tools and pauses matching tool calls until they are approved, denied, or timed out.

Domain errors from MCP tools are returned as isError: true content per the MCP specification, allowing AI agents to see error details and adjust their behavior.

Session Sandboxing

File operation sessions are confined to a root directory. Path traversal attempts — including those using symlinks that resolve outside the sandbox — are rejected before any I/O occurs. See Sessions for details.

Code Signing

Release binaries are signed at multiple levels to ensure integrity and trust:

Platform	Signing Method	Verification
macOS	Apple Developer ID + Notarization	Gatekeeper (automatic)
Windows	Authenticode (SHA-256)	SmartScreen / Windows Defender
All	Ed25519 detached signature (`.sig`)	Verified by nefia-agent on startup and auto-update

macOS signing and notarization ensures Gatekeeper allows the binary to run without warnings. Windows Authenticode signing ensures SmartScreen does not flag the binary as untrusted. The Ed25519 signatures provide a platform-independent integrity check that does not depend on any certificate authority.

Policy Engine

Define allow/deny rules for commands and file paths.

Audit Logging

Track every operation with tamper-evident logs.

Authentication

OAuth login, token management, and VPN enrollment.