Troubleshooting
Diagnose and resolve connection errors, configuration issues, and agent problems.
Overview
This guide explains common Nefia issues and their solutions. Start by using the diagnostic tools to identify the root cause, then follow the resolution steps for each category.
Diagnostic Tools
Nefia includes three built-in diagnostic tools to help identify and resolve problems.
nefia doctor
Runs a comprehensive health check of the entire system:
nefia doctorChecks: configuration validation, authentication status, audit logs, VPN diagnostics, and TCP connectivity tests to each host.
To check a specific host only:
nefia doctor --host my-servernefia netcheck
Provides detailed diagnostics for VPN network connectivity:
nefia netcheckChecks: NAT type classification, STUN/DERP/TURN reachability, connection path (Direct/DERP/TURN) and latency per host.
To diagnose a specific host only:
nefia netcheck --host my-servernefia bugreport
Generates a sanitized diagnostic report for support:
nefia bugreport
nefia bugreport --output-file /tmp/nefia-report.jsonVPN Connection Issues
Tunnel cannot be established
Symptoms: Connection stays pending in nefia vpn status, or the handshake does not complete.
Diagnostic steps:
# Diagnose the entire VPN environment
nefia vpn diagnose
# Network diagnostics
nefia netcheckCommon causes and solutions:
-
Port is blocked
- Verify that UDP 51820 (WireGuard) is allowed by the firewall
- Check the
port-availableresult innefia vpn diagnose
bashlsof -i :51820 -
NAT type issue
- With Symmetric NAT (EDM), direct connections are difficult
- DERP relay is automatically used as a fallback
- Check the NAT type and connection path with
nefia netcheck
-
Incorrect endpoint
- After a network change, the endpoint may have changed. Reissue the token:
bashnefia vpn reinvite --name <host-id> --stun -
Cannot connect to DERP relay
- See the DERP Relay Issues section
Connection timeout
Symptoms: E2002 (SSH connection failed) or connection timeout occurs when running commands.
Solutions:
-
Verify that the VPN tunnel is established:
bashnefia vpn status --live --ping -
If a host shows
[STALE], the tunnel may be disconnected:bashnefia vpn diagnose --host <host-id> -
If the cause is a network change on the agent side (e.g., Wi-Fi reconnection), the agent's network monitor will attempt automatic recovery within 5 seconds. If recovery does not occur, restart the agent:
bashnefia exec --host <host-id> "sudo systemctl restart nefia-agent"
Port in use (E1004)
Symptoms: [E1004] VPN setup failed error when running nefia vpn invite.
Solutions:
# Check which process is using the port
lsof -i :51820 # WireGuard VPN port
lsof -i :19820 # Enrollment listener port
# Terminate the process
kill <PID>
# Retry
nefia vpn reinvite --name <host-id> --stunSSH Connection Issues
Authentication failure (E2001)
Symptoms: [E2001] SSH authentication failed
Solutions:
-
Verify that the SSH key is correctly configured:
bashssh -i ~/.ssh/id_ed25519 <user>@<host> -
Check key file permissions:
bash# macOS / Linux chmod 600 ~/.ssh/id_ed25519 # Windows (PowerShell) icacls $env:USERPROFILE\.ssh\id_ed25519 /inheritance:r /grant "$($env:USERNAME):(R)" -
Verify that the public key is included in
~/.ssh/authorized_keyson the remote host -
Confirm that the correct key path is set in
ssh.identitiesinnefia.yaml
Host key verification error
Symptoms: A warning is displayed indicating that the host key has changed.
Solutions:
If the host key changed due to an OS reinstall or similar:
# Remove the old entry from known_hosts
ssh-keygen -R <vpn-address>
# Reconnect and accept the new host key
nefia exec --host <host-id> "hostname"Command timeout (E2003)
Symptoms: [E2003] Command timed out
Solutions:
-
Increase the timeout value:
bashnefia exec --host <host-id> --timeout 300s "long-running-command" -
Check network quality:
bashnefia netcheck --host <host-id> -
If latency exceeds 500ms, review the geographic placement of the DERP relay
Agent Issues
Enrollment failure (E1003)
Symptoms: nefia-agent enroll fails.
Common causes:
-
Token expired -- The default TTL is 24 hours. Reissue the token:
bashnefia vpn reinvite --name <host-id> --stun -
Nonce reuse -- Each token can only be used once. Attempting to enroll with an already-used token will result in an error. Issue a new token.
-
Network unreachable -- Connectivity from agent to operator:
- If direct connection (TCP 19820) fails, it automatically falls back via cloud relay
- If the cloud relay is also unavailable, try tethering or set up port forwarding
-
Blocked by Gatekeeper / SmartScreen -- See the Code Signing Verification Issues section
Agent service fails to start
Diagnostic steps:
# Linux (systemd)
sudo systemctl status nefia-agent
sudo journalctl -u nefia-agent -n 50
# macOS (launchd)
sudo launchctl list | grep nefia
sudo log show --predicate 'process == "nefia-agent"' --last 5m
# Windows (PowerShell)
Get-ScheduledTask -TaskName "nefia-agent" | Format-List
Get-WinEvent -FilterHashtable @{LogName='Application'; ProviderName='nefia-agent'} -MaxEvents 20Common causes:
- Syntax error in
agent.yaml-- validate with a YAML linter - SSH port is in use by another process
- Insufficient permissions (for systemd, verify it is running as the correct user)
Policy Issues
Command denied by policy (E5001)
Symptoms: [E5001] Command denied by policy
Solutions:
-
Check the current policy configuration:
bashnefia policy show -
Verify the policy mode (
enforce/warn/off) -
Check whether a deny pattern matches the command. Policy rules use regular expressions, and may require start (
^) or end ($) anchors. -
Preview the policy evaluation result with
--dry-run:bashnefia exec --host <host-id> --dry-run "your-command"
Path denied by policy (E3005)
Symptoms: [E3005] Path denied by policy
Solutions:
Verify that the file operation path matches the allow_paths / deny_paths in the policy. Deny patterns take precedence over allow patterns.
DERP Relay Issues
DERP connection failure (E1005)
Symptoms: [E1005] DERP relay connection failed
Solutions:
-
Verify that the DERP server URL is correct:
yaml# nefia.yaml vpn: derp_servers: - url: "wss://relay.nefia.ai/derp" region: "ap-northeast-1" -
Confirm the URL starts with
wss://orws:// -
Health check the DERP server:
bashcurl -v https://relay.nefia.ai/healthz -
Verify that the firewall allows HTTPS (port 443)
DERP authentication failure (E1006)
Symptoms: [E1006] DERP relay authentication failed
Solutions:
If using --allowed-keys-file with a self-hosted DERP, verify that the client's WireGuard public key is included in the file.
# Check the operator's public key
nefia vpn statusRate limiting (HTTP 429)
The default rate limit for the DERP server is 5 requests/second (burst of 10).
- If running behind a reverse proxy, verify that the
--trust-proxyflag is enabled - For details, see Deploying a DERP Relay Server
MCP Server Issues
MCP operation timeout
Symptoms: MCP tool calls time out.
Solutions:
-
Check the MCP server timeout setting:
yaml# nefia.yaml mcp: command_timeout: 120s -
Set a sufficient timeout for long-running commands
-
Check VPN connection quality:
bashnefia netcheck
MCP approval denied
Symptoms: MCP tool calls time out waiting for approval.
Solutions:
- Check the MCP server approval mode. In
automode, calls are automatically approved. Inpromptmode, manual approval from the operator is required. - Commands matching a policy deny pattern are rejected regardless of approval mode.
MCP concurrency limit
Symptoms: concurrent execution limit reached error.
Solutions:
Check and adjust the MCP server concurrency limit:
# nefia.yaml
mcp:
max_concurrent: 10Code Signing Verification Issues
macOS: Blocked by Gatekeeper
Symptoms: Process is terminated with Killed: 9 (SIGKILL).
Solutions:
# Remove the quarantine attribute
sudo xattr -d com.apple.quarantine /usr/local/bin/nefia-agent
sudo xattr -cr /usr/local/bin/nefia-agent
# Manually re-run enrollment
nefia-agent enroll --token '<INVITE_TOKEN>' --install --yesIf this does not resolve the issue, go to System Settings > Privacy & Security and click "Allow Anyway".
Windows: Blocked by SmartScreen
Symptoms: A "Windows protected your PC" dialog appears, or a virus detection warning is shown.
Solutions:
# Remove the Zone.Identifier
Unblock-File -Path "$env:ProgramData\nefia\nefia-agent.exe"
# Manually re-run enrollment
nefia-agent.exe enroll --token-file C:\path\to\token.txt --install --yesAlternatively, go to Windows Security > Virus & threat protection > Protection history and select "Allow on device".
Configuration Issues
Configuration validation error (E4001)
Symptoms: [E4001] Invalid configuration
Solutions:
-
Check the YAML syntax of the configuration file:
bashnefia doctor -
Verify the configuration file location:
- macOS:
~/Library/Application Support/nefia/nefia.yaml - Linux:
~/.config/nefia/nefia.yaml
- macOS:
-
Confirm that all required fields are set
-
For VPN settings, confirm that
vpn.enabled: trueis set
Host not found (E4003)
Symptoms: [E4003] Host not found
Solutions:
-
Check the host list:
bashnefia hosts list -
Verify that the selector syntax is correct. For pattern selectors:
bashnefia exec --host "web-*" "hostname" -
For tag selectors:
bashnefia exec --host "tag:production" "hostname"
Error Code Reference
List of commonly encountered error codes:
| Code | Category | Description |
|---|---|---|
| E1001 | VPN | VPN connection failed |
| E1003 | VPN | Enrollment failed |
| E1004 | VPN | VPN setup failed (e.g., port in use) |
| E1005 | DERP | DERP relay connection failed |
| E2001 | SSH | SSH authentication failed |
| E2002 | SSH | SSH connection failed |
| E2003 | SSH | Command timed out |
| E3005 | Policy | Path denied by policy |
| E4001 | Config | Configuration validation error |
| E4003 | Config | Host not found |
| E5001 | Policy | Command denied by policy |
For all error codes and detailed descriptions, see the Error Code Reference.
To look up any error code from the terminal:
nefia explain E2001Getting Support
If the steps above do not resolve your issue:
- Generate a diagnostic report with
nefia bugreport - Create an issue on GitHub Issues and include the report ID
Next Steps
Detailed WireGuard VPN configuration and NAT traversal explained.
A complete catalog of all error codes and their solutions.
Building and operating a self-hosted DERP relay.