Security Overview
Understand Nefia's defense-in-depth security architecture and threat model.
Nefia is designed with a defense-in-depth security model. Every connection traverses multiple independent security layers, so a breach in any single layer does not compromise the system as a whole.
Security Layers
Nefia enforces five distinct security layers for every operation:
1. WireGuard VPN Tunnels
All traffic between the operator and target PCs is encrypted using WireGuard tunnels with Curve25519 key exchange and ChaCha20-Poly1305 authenticated encryption. No ports are exposed on target machines — connections are only reachable through the VPN.
2. SSH Inside VPN
SSH runs inside the WireGuard tunnel, providing a second layer of encryption. To reduce double-encryption overhead, Nefia negotiates lightweight ciphers for the inner SSH session while maintaining full authentication and integrity guarantees.
The SSH layer is further hardened with:
- Hardened key exchange: Only modern algorithms are offered —
curve25519-sha256,curve25519-sha256@libssh.org, andecdh-sha2-nistp256. Legacy Diffie-Hellman groups are excluded. - Hardened MACs: Encrypt-then-MAC variants are preferred (
hmac-sha2-256-etm@openssh.com), withhmac-sha2-256as a fallback. - Trust On First Use (TOFU): Unknown hosts connecting over the VPN are automatically accepted and their keys persisted to
known_hosts. If a host key changes, the connection is rejected to prevent man-in-the-middle attacks. Key change events are logged at WARN level for security monitoring. Concurrent access is protected by a RWMutex. - File size enforcement: SFTP reads enforce
ssh.max_file_size_byteseven when the caller does not specify a limit, preventing out-of-memory conditions from unbounded reads.
3. Policy Engine
A regex-based policy engine enforces command and file path guardrails. Operators define allow/deny rules that apply equally to human CLI users and AI agents connected via the MCP server.
4. Audit Logging
Every command execution, file operation, and session event is recorded in an append-only JSONL log. Entries are chained using SHA-256 hashes and protected with HMAC-SHA256 signatures. File handles are reused on a per-day basis, and hash chain integrity is automatically preserved through rollback on write/sync failure.
5. Authentication
Operators authenticate via OAuth with tokens stored in the OS keyring (macOS Keychain, Linux Secret Service, Windows Credential Manager). VPN enrollment uses HMAC-SHA256 signed tokens with single-use nonces to prevent replay attacks.
Threat Model
Nefia is designed to protect against the following threats:
| Threat | Mitigation |
|---|---|
| Network eavesdropping | WireGuard encryption + SSH inner layer |
| Unauthorized access to targets | VPN enrollment + SSH public key auth |
| Lateral movement between targets | Star topology — no target-to-target traffic |
| Destructive commands | Policy engine deny rules (anchor enforcement in enforce/warn modes) |
| AI agent misuse | MCP policy enforcement with same guardrails |
| Audit log tampering | SHA-256 hash chain + HMAC-SHA256 signatures (key stored outside audit logs) |
| Credential theft | OS keyring storage + AES-256-GCM fallback |
| Enrollment token replay | Single-use nonces with HMAC-SHA256 signatures |
| Agent token theft | OS keyring storage + AES-256-GCM fallback + 90-day TTL auto-rotation |
| Ping reflection attack | Nonce binding with constant-time comparison |
| Relay abuse / DDoS | DERP per-IP rate limiting (5 req/s, burst 10) + ACL key allowlist |
| Cloud API downgrade | HTTPS enforcement — http:// URLs rejected at config load |
| TURN protocol downgrade | TLS 1.3 minimum enforced on all TURN connections |
| Agent resource exhaustion | Semaphore-based 100 concurrent connection limit |
| Hairpin NAT IP leak | DiscoverLocalEndpoint validates against public endpoint |
| Per-IP enrollment abuse | Per-IP rate limiting on enrollment endpoints |
| Cloud relay data leakage | No tokens stored server-side; nonce as session secret; unified error responses |
| ReDoS via crafted input | Policy engine input length limit (8,192 bytes, unconditional deny) |
| Path traversal | Session sandboxing with symlink resolution |
| Unauthorized team access | Team role enforcement (Owner/Admin/Member RBAC) with membership verification |
| VPN address collision (concurrent enrollment) | Exclusive file locking with non-blocking retry + stale PID detection (Windows) |
Star Topology
Nefia uses a hub-and-spoke VPN topology where the operator PC is the hub and each target PC is a spoke:
Target A ──── Operator (Hub) ──── Target B
│
Target CTargets cannot communicate with each other directly. All traffic must pass through the operator, which means a compromised target cannot pivot to attack other targets on the VPN.
Userspace WireGuard
Nefia runs WireGuard entirely in userspace via gVisor netstack. This means:
- No root or admin privileges required — the VPN runs as a regular user process
- Reduced kernel attack surface — no kernel modules to load or maintain
- Portable across platforms — same implementation on macOS, Linux, and Windows
VPN Enrollment Security
Target PCs are enrolled using cryptographically signed invite tokens:
The operator generates an invite token with nefia vpn invite. The token is signed with HMAC-SHA256 and includes a single-use nonce.
The target agent receives the token and validates the HMAC signature against the operator's shared secret.
A WireGuard key exchange is performed over a temporary channel (direct TCP or cloud relay). The nonce is consumed and cannot be reused.
The VPN tunnel is established and the target's SSH host key is registered automatically.
Cloud Relay Security
When agents enroll via the cloud relay (nefia.ai), the following security measures apply:
- No invite tokens stored server-side — The cloud relay stores only the operator's WireGuard public key, VPN address, and endpoint metadata. The full HMAC-signed invite token is never sent to or stored on the server.
- Nonce as session secret — The 128-bit cryptographically random nonce serves as the cloud relay session identifier. Only the operator and the agent (via the invite token) know this value.
- Unified error responses — Unauthenticated endpoints return identical
404responses for missing, expired, and consumed sessions, preventing nonce state enumeration. - Atomic state transitions — The
completed → consumedtransition uses atomic database operations to prevent two concurrent polls from both reading the enrollment data. - Audit logging — Both session creation (operator-side) and agent completion are recorded as audit events with IP addresses.
- Dead-end prevention — If no direct endpoint is available and the cloud relay session creation fails, the invite is blocked entirely rather than creating an unusable token.
DERP Relay Security
When direct peer-to-peer connections are not possible (symmetric NAT, CGNAT), traffic is relayed through a DERP (Designated Encrypted Relay for Packets) server. The DERP server enforces multiple security layers:
- Per-IP rate limiting: Each client IP is limited to 5 requests per second with a burst of 10. Exceeding this limit returns HTTP 429 responses, preventing abuse and resource exhaustion.
- Access control list (ACL): The
--allowed-keys-fileflag restricts relay access to a file of authorized WireGuard public keys. Only listed keys can establish relay sessions. - TLS transport: DERP connections use WebSocket, with TLS termination expected at the load balancer level. The
wss://scheme is recommended for production deployments. Thews://scheme is permitted for local development and testing. The server enforces aReadHeaderTimeoutof 10 seconds to prevent slowloris attacks. - Authenticated health endpoint: The
/healthzendpoint exposes only basic liveness information by default. Full metrics (connected clients, relay statistics) require a Bearer token configured via the--metrics-tokenflag. - Graceful shutdown: On termination, the server sends
StatusGoingAwayto all connected clients, allowing them to failover to an alternative relay without connection errors.
Agent Token Security
In addition to WireGuard keys, each enrolled agent receives a unique API token for cloud communication (endpoint reporting, token rotation):
- Generation: A 256-bit token is generated using
crypto.randomBytes(32)during enrollment. Only the SHA-256 hash is stored server-side as anApiTokenrecord — the plaintext token never persists on the server. - Storage: The agent stores its token in the OS keyring (macOS Keychain, Linux Secret Service, Windows Credential Manager). When no keyring is available, the token is encrypted with AES-256-GCM and written to a fallback file.
- Automatic rotation: Tokens have a 90-day TTL. When the remaining TTL drops below 7 days, the agent initiates rotation automatically on a 24-hour check cycle via
POST /api/agent-tokens/rotate. The old token is revoked atomically when the new token is issued. Rotation failures trigger exponential backoff (5-minute base, 1-hour cap) to avoid overwhelming the server. - Ping nonce binding: VPN ping responses are verified using
subtle.ConstantTimeCompareto prevent reflection attacks where an attacker replays a legitimate ping response.
HTTPS Enforcement
All communication with the Nefia cloud API requires HTTPS. The validateCloudAPIBase() function rejects any http:// URL at configuration load time, ensuring credentials and tokens are never transmitted over unencrypted connections.
HTTP Response and Payload Size Limits
All HTTP responses and payloads are size-limited to prevent memory exhaustion from oversized or malicious data:
| Constant | Limit | Scope |
|---|---|---|
MaxAPIResponseBytes | 1 MB | Standard API responses |
MaxEnrollResponseBytes | 1 MB | Enrollment API responses |
MaxWebhookPayloadBytes | 1 MB | Outgoing SIEM/webhook payloads |
MaxInlineFileBytes | 10 MB | Inline file content via MCP |
These limits are centralized in internal/limits/ and enforced at every HTTP boundary.
TURN TLS Hardening
When TURN relay servers are used for NAT traversal, Nefia enforces TLS 1.3 as the minimum version. This prevents downgrade attacks and ensures only modern cipher suites are negotiated.
Nonce Audit Logging
Enrollment nonce events are recorded in the audit log with structured fields for forensic analysis:
- Nonce prefix: The first 8 characters of the nonce (sufficient for correlation without exposing the full value)
- Source endpoint: The endpoint (IP:port) of the enrolling agent
- Host ID: The target host identifier
This enables operators to trace enrollment attempts and detect suspicious patterns without logging sensitive nonce material.
Agent Connection Limits
The nefia-agent enforces a semaphore-based limit of 100 concurrent forwarded connections. When this limit is reached, new incoming connections are rejected immediately to prevent resource exhaustion. This protects target PCs from being overwhelmed by misconfigured or malicious operators.
The agent's health watchdog enters a 30-minute cooldown after 5 consecutive rebuild attempts (rebuilds are triggered after 2 consecutive health check failures), preventing excessive resource consumption. After the cooldown, it resumes monitoring and rebuild attempts.
Connection Pool Reliability
SSH connection pool entries use sync.Once for close operations, preventing double-close issues from concurrent access. Health checks ensure dead connection pool entries are always closed, preventing resource leaks.
VPN peer dial mutex entries are cleaned up on tunnel rebuild only. Entries are intentionally retained during peer removal because concurrent DialTCPWithFallback calls may still hold a reference to the mutex.
Alerting
Nefia includes a webhook-based alerting system that notifies operators of security-relevant events in real time. Alerts are dispatched asynchronously and do not block the triggering operation.
The alerting system monitors the following event categories:
exec_failure— Command execution failures across one or more hostsvpn_peer_unhealthy— VPN peer handshake staleness detected by the health monitorcircuit_breaker_open— SSH circuit breaker tripped for a host after consecutive failurespolicy_rebuild_failed— Policy engine failed to rebuild after a config hot-reloadenrollment_complete— A host finished enrollment successfullyhost_online/host_offline— A host changed reachability statekey_rotation— WireGuard key rotation completedconfig_change— Configuration was saved or reloaded with changesplaybook_complete/playbook_failed— A playbook run finished successfully or failedqueue_executed/queue_failed— An offline queued command succeeded or failedhost_revoked/host_revoke_all— One host, or all hosts, had VPN access revokedvpn_path_switch— VPN multipath failover occurred (active path switched between direct and relay)
Alerts support two delivery formats: Slack (Block Kit) and generic JSON. Per-event-type cooldown (default: 5 minutes) prevents alert flooding. Failed webhook deliveries are retried up to 3 times with exponential backoff (1s base, 10s per-attempt timeout).
See the alerts configuration for setup instructions.
Observability
Nefia provides two complementary observability layers:
- OpenTelemetry tracing: Distributed traces exported via OTLP HTTP. A
TraceHandlerautomatically injectstrace_idandspan_idinto all structured log output, enabling log-to-trace correlation. When the OTLP endpoint is unreachable, a noop tracer is installed silently — startup is never blocked. - Prometheus metrics: A dedicated HTTP server exposes
/metrics(Prometheus scrape),/healthz(liveness), and/readyz(readiness) endpoints. Metrics cover command execution, SSH dial attempts, connection pool size, filesystem operations, session lifecycle, VPN peer health, and playbook runs — both as counters and as duration histograms.
Both subsystems are disabled by default and add zero overhead when disabled. See the telemetry configuration for setup instructions.
MCP Policy Enforcement
When AI agents connect via the MCP server (nefia mcp serve), they are subject to the same policy engine rules as human operators. There is no separate privilege model — the deny rules, path restrictions, and RBAC roles apply identically.
This ensures that an AI agent cannot bypass guardrails that a human operator would be subject to, regardless of how the agent formulates its requests.
The MCP approval workflow is controlled by mcp.approval. When enabled, nefia mcp serve exposes approval tools and pauses matching tool calls until they are approved, denied, or timed out.
Domain errors from MCP tools are returned as isError: true content per the MCP specification, allowing AI agents to see error details and adjust their behavior.
Session Sandboxing
File operation sessions are confined to a root directory. Path traversal attempts — including those using symlinks that resolve outside the sandbox — are rejected before any I/O occurs. See Sessions for details.
Code Signing
Release binaries are signed at multiple levels to ensure integrity and trust:
| Platform | Signing Method | Verification |
|---|---|---|
| macOS | Apple Developer ID + Notarization | Gatekeeper (automatic) |
| Windows | Authenticode (SHA-256) | SmartScreen / Windows Defender |
| All | Ed25519 detached signature (.sig) | Verified by nefia-agent on startup and auto-update |
macOS signing and notarization ensures Gatekeeper allows the binary to run without warnings. Windows Authenticode signing ensures SmartScreen does not flag the binary as untrusted. The Ed25519 signatures provide a platform-independent integrity check that does not depend on any certificate authority.
Related
Define allow/deny rules for commands and file paths.
Track every operation with tamper-evident logs.
OAuth login, token management, and VPN enrollment.