Skip to content

Troubleshooting

Diagnose and resolve connection errors, configuration issues, and agent problems.

Overview

This guide explains common Nefia issues and their solutions. Start by using the diagnostic tools to identify the root cause, then follow the resolution steps for each category.

Diagnostic Tools

Nefia includes three built-in diagnostic tools to help identify and resolve problems.

nefia doctor

Runs a comprehensive health check of the entire system:

bash
nefia doctor

Checks: configuration validation, authentication status, audit logs, VPN diagnostics, and TCP connectivity tests to each host.

To check a specific host only:

bash
nefia doctor --host my-server

nefia netcheck

Provides detailed diagnostics for VPN network connectivity:

bash
nefia netcheck

Checks: NAT type classification, STUN/DERP/TURN reachability, connection path (Direct/DERP/TURN) and latency per host.

To diagnose a specific host only:

bash
nefia netcheck --host my-server

nefia bugreport

Generates a sanitized diagnostic report for support:

bash
nefia bugreport
nefia bugreport --output-file /tmp/nefia-report.json

VPN Connection Issues

Tunnel cannot be established

Symptoms: Connection stays pending in nefia vpn status, or the handshake does not complete.

Diagnostic steps:

bash
# Diagnose the entire VPN environment
nefia vpn diagnose
 
# Network diagnostics
nefia netcheck

Common causes and solutions:

  1. Port is blocked

    • Verify that UDP 51820 (WireGuard) is allowed by the firewall
    • Check the port-available result in nefia vpn diagnose
    bash
    lsof -i :51820
  2. NAT type issue

    • With Symmetric NAT (EDM), direct connections are difficult
    • DERP relay is automatically used as a fallback
    • Check the NAT type and connection path with nefia netcheck
  3. Incorrect endpoint

    • After a network change, the endpoint may have changed. Reissue the token:
    bash
    nefia vpn reinvite --name <host-id> --stun
  4. Cannot connect to DERP relay

Connection timeout

Symptoms: E2002 (SSH connection failed) or connection timeout occurs when running commands.

Solutions:

  1. Verify that the VPN tunnel is established:

    bash
    nefia vpn status --live --ping
  2. If a host shows [STALE], the tunnel may be disconnected:

    bash
    nefia vpn diagnose --host <host-id>
  3. If the cause is a network change on the agent side (e.g., Wi-Fi reconnection), the agent's network monitor will attempt automatic recovery within 5 seconds. If recovery does not occur, restart the agent:

    bash
    nefia exec --host <host-id> "sudo systemctl restart nefia-agent"
Port in use (E1004)

Symptoms: [E1004] VPN setup failed error when running nefia vpn invite.

Solutions:

bash
# Check which process is using the port
lsof -i :51820    # WireGuard VPN port
lsof -i :19820    # Enrollment listener port
 
# Terminate the process
kill <PID>
 
# Retry
nefia vpn reinvite --name <host-id> --stun

SSH Connection Issues

Authentication failure (E2001)

Symptoms: [E2001] SSH authentication failed

Solutions:

  1. Verify that the SSH key is correctly configured:

    bash
    ssh -i ~/.ssh/id_ed25519 <user>@<host>
  2. Check key file permissions:

    bash
    # macOS / Linux
    chmod 600 ~/.ssh/id_ed25519
     
    # Windows (PowerShell)
    icacls $env:USERPROFILE\.ssh\id_ed25519 /inheritance:r /grant "$($env:USERNAME):(R)"
  3. Verify that the public key is included in ~/.ssh/authorized_keys on the remote host

  4. Confirm that the correct key path is set in ssh.identities in nefia.yaml

Host key verification error

Symptoms: A warning is displayed indicating that the host key has changed.

Solutions:

If the host key changed due to an OS reinstall or similar:

bash
# Remove the old entry from known_hosts
ssh-keygen -R <vpn-address>
 
# Reconnect and accept the new host key
nefia exec --host <host-id> "hostname"
Command timeout (E2003)

Symptoms: [E2003] Command timed out

Solutions:

  1. Increase the timeout value:

    bash
    nefia exec --host <host-id> --timeout 300s "long-running-command"
  2. Check network quality:

    bash
    nefia netcheck --host <host-id>
  3. If latency exceeds 500ms, review the geographic placement of the DERP relay

Agent Issues

Enrollment failure (E1003)

Symptoms: nefia-agent enroll fails.

Common causes:

  1. Token expired -- The default TTL is 24 hours. Reissue the token:

    bash
    nefia vpn reinvite --name <host-id> --stun
  2. Nonce reuse -- Each token can only be used once. Attempting to enroll with an already-used token will result in an error. Issue a new token.

  3. Network unreachable -- Connectivity from agent to operator:

    • If direct connection (TCP 19820) fails, it automatically falls back via cloud relay
    • If the cloud relay is also unavailable, try tethering or set up port forwarding
  4. Blocked by Gatekeeper / SmartScreen -- See the Code Signing Verification Issues section

Agent service fails to start

Diagnostic steps:

bash
# Linux (systemd)
sudo systemctl status nefia-agent
sudo journalctl -u nefia-agent -n 50
 
# macOS (launchd)
sudo launchctl list | grep nefia
sudo log show --predicate 'process == "nefia-agent"' --last 5m
 
# Windows (PowerShell)
Get-ScheduledTask -TaskName "nefia-agent" | Format-List
Get-WinEvent -FilterHashtable @{LogName='Application'; ProviderName='nefia-agent'} -MaxEvents 20

Common causes:

  • Syntax error in agent.yaml -- validate with a YAML linter
  • SSH port is in use by another process
  • Insufficient permissions (for systemd, verify it is running as the correct user)

Policy Issues

Command denied by policy (E5001)

Symptoms: [E5001] Command denied by policy

Solutions:

  1. Check the current policy configuration:

    bash
    nefia policy show
  2. Verify the policy mode (enforce / warn / off)

  3. Check whether a deny pattern matches the command. Policy rules use regular expressions, and may require start (^) or end ($) anchors.

  4. Preview the policy evaluation result with --dry-run:

    bash
    nefia exec --host <host-id> --dry-run "your-command"
Path denied by policy (E3005)

Symptoms: [E3005] Path denied by policy

Solutions:

Verify that the file operation path matches the allow_paths / deny_paths in the policy. Deny patterns take precedence over allow patterns.

DERP Relay Issues

DERP connection failure (E1005)

Symptoms: [E1005] DERP relay connection failed

Solutions:

  1. Verify that the DERP server URL is correct:

    yaml
    # nefia.yaml
    vpn:
      derp_servers:
        - url: "wss://relay.nefia.ai/derp"
          region: "ap-northeast-1"
  2. Confirm the URL starts with wss:// or ws://

  3. Health check the DERP server:

    bash
    curl -v https://relay.nefia.ai/healthz
  4. Verify that the firewall allows HTTPS (port 443)

DERP authentication failure (E1006)

Symptoms: [E1006] DERP relay authentication failed

Solutions:

If using --allowed-keys-file with a self-hosted DERP, verify that the client's WireGuard public key is included in the file.

bash
# Check the operator's public key
nefia vpn status
Rate limiting (HTTP 429)

The default rate limit for the DERP server is 5 requests/second (burst of 10).

MCP Server Issues

MCP operation timeout

Symptoms: MCP tool calls time out.

Solutions:

  1. Check the MCP server timeout setting:

    yaml
    # nefia.yaml
    mcp:
      command_timeout: 120s
  2. Set a sufficient timeout for long-running commands

  3. Check VPN connection quality:

    bash
    nefia netcheck
MCP approval denied

Symptoms: MCP tool calls time out waiting for approval.

Solutions:

  • Check the MCP server approval mode. In auto mode, calls are automatically approved. In prompt mode, manual approval from the operator is required.
  • Commands matching a policy deny pattern are rejected regardless of approval mode.
MCP concurrency limit

Symptoms: concurrent execution limit reached error.

Solutions:

Check and adjust the MCP server concurrency limit:

yaml
# nefia.yaml
mcp:
  max_concurrent: 10

Code Signing Verification Issues

macOS: Blocked by Gatekeeper

Symptoms: Process is terminated with Killed: 9 (SIGKILL).

Solutions:

bash
# Remove the quarantine attribute
sudo xattr -d com.apple.quarantine /usr/local/bin/nefia-agent
sudo xattr -cr /usr/local/bin/nefia-agent
 
# Manually re-run enrollment
nefia-agent enroll --token '<INVITE_TOKEN>' --install --yes

If this does not resolve the issue, go to System Settings > Privacy & Security and click "Allow Anyway".

Windows: Blocked by SmartScreen

Symptoms: A "Windows protected your PC" dialog appears, or a virus detection warning is shown.

Solutions:

powershell
# Remove the Zone.Identifier
Unblock-File -Path "$env:ProgramData\nefia\nefia-agent.exe"
 
# Manually re-run enrollment
nefia-agent.exe enroll --token-file C:\path\to\token.txt --install --yes

Alternatively, go to Windows Security > Virus & threat protection > Protection history and select "Allow on device".

Configuration Issues

Configuration validation error (E4001)

Symptoms: [E4001] Invalid configuration

Solutions:

  1. Check the YAML syntax of the configuration file:

    bash
    nefia doctor
  2. Verify the configuration file location:

    • macOS: ~/Library/Application Support/nefia/nefia.yaml
    • Linux: ~/.config/nefia/nefia.yaml
  3. Confirm that all required fields are set

  4. For VPN settings, confirm that vpn.enabled: true is set

Host not found (E4003)

Symptoms: [E4003] Host not found

Solutions:

  1. Check the host list:

    bash
    nefia hosts list
  2. Verify that the selector syntax is correct. For pattern selectors:

    bash
    nefia exec --host "web-*" "hostname"
  3. For tag selectors:

    bash
    nefia exec --host "tag:production" "hostname"

Error Code Reference

List of commonly encountered error codes:

CodeCategoryDescription
E1001VPNVPN connection failed
E1003VPNEnrollment failed
E1004VPNVPN setup failed (e.g., port in use)
E1005DERPDERP relay connection failed
E2001SSHSSH authentication failed
E2002SSHSSH connection failed
E2003SSHCommand timed out
E3005PolicyPath denied by policy
E4001ConfigConfiguration validation error
E4003ConfigHost not found
E5001PolicyCommand denied by policy

For all error codes and detailed descriptions, see the Error Code Reference.

To look up any error code from the terminal:

bash
nefia explain E2001

Getting Support

If the steps above do not resolve your issue:

  1. Generate a diagnostic report with nefia bugreport
  2. Create an issue on GitHub Issues and include the report ID

Next Steps

VPN Setup Guide

Detailed WireGuard VPN configuration and NAT traversal explained.

Error Code Reference

A complete catalog of all error codes and their solutions.

Deploying a DERP Relay Server

Building and operating a self-hosted DERP relay.