Skip to content

DERP Relay Server Deployment

Deploy and operate a self-hosted DERP relay server with Docker, Fly.io, Prometheus monitoring, and ACL configuration.

Overview

DERP (Designated Encrypted Relay for Packets) is a relay server that forwards WireGuard packets over WebSocket in environments where direct UDP connections cannot be established (Symmetric NAT, CGNAT, strict firewalls).

Nefia uses the public relay at relay.nefia.ai by default, but self-hosting a DERP server is recommended in the following cases:

  • Latency optimization — Place the relay geographically close to your users
  • Privacy requirements — Ensure traffic only passes through your own infrastructure
  • Availability requirements — Eliminate dependency on the public relay
  • ACL control — Allow connections only from authorized WireGuard public keys
plaintext
Agent ──── wss:// ───→ DERP Relay ←── wss:// ──── Operator
             (WebSocket relay, encrypted WireGuard packets)

Docker Deployment

Building the Image

Build the image using the deploy/derp/Dockerfile included in the repository:

bash
docker build -t nefia-derp:latest -f deploy/derp/Dockerfile .

This multi-stage build:

  1. Statically compiles nefia-derp with Go 1.25
  2. Deploys onto a distroless (nonroot) base image
  3. Exposes port 8443

Basic Startup

bash
docker run -d \
  --name nefia-derp \
  -p 8443:8443 \
  -e NEFIA_DERP_METRICS_TOKEN="$(openssl rand -base64 32)" \
  nefia-derp:latest

docker-compose.yml

Example configuration for production environments:

yaml
services:
  nefia-derp:
    image: nefia-derp:latest
    build:
      context: .
      dockerfile: deploy/derp/Dockerfile
    ports:
      - "8443:8443"
      - "127.0.0.1:9090:9090"  # Prometheus metrics (local only)
    environment:
      NEFIA_DERP_METRICS_TOKEN: "${DERP_METRICS_TOKEN}"
    command:
      - "--addr=:8443"
      - "--max-clients=10000"
      - "--metrics-port=9090"
      - "--allowed-keys-file=/etc/nefia/allowed-keys.txt"
    volumes:
      - ./allowed-keys.txt:/etc/nefia/allowed-keys.txt:ro
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 256M
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8443/healthz"]
      interval: 15s
      timeout: 5s
      retries: 3

Fly.io Deployment

You can deploy to Fly.io using the deploy/derp/fly.toml included in the repository.

1

Create the App

bash
fly apps create nefia-derp --machines
2

Set Secrets

bash
fly secrets set NEFIA_DERP_METRICS_TOKEN="$(openssl rand -base64 32)" --app nefia-derp
3

Deploy

bash
fly deploy --config deploy/derp/fly.toml
4

Scale to Regions

bash
# Tokyo (default primary_region)
# US East
fly scale count 1 --region iad --app nefia-derp
# EU West
fly scale count 1 --region cdg --app nefia-derp
# Southeast Asia
fly scale count 1 --region sin --app nefia-derp

Key Fly.io configuration points:

SettingValueDescription
VM Memory256MBLightweight relay server
CPUshared x1Low-cost operation
force_httpstrueEnforce TLS
auto_stop_machinesfalseAlways running
min_machines_running1Ensure at least 1 instance
Concurrent connection hard limit10,000soft_limit: 8,000

Configuration Flags

FlagDefaultDescription
--addr:8443Listen address for WebSocket connections
--max-clients10000Maximum number of concurrent client connections
--ping-interval30sKeepalive ping interval
--allowed-keys-file(none)File path of allowed WireGuard public keys (requires --open-relay if omitted)
--open-relayfalseExplicitly allow connections from any WireGuard key when no --allowed-keys-file is specified. Not recommended for production
--bindings-file<data-dir>/tofu-bindings.jsonPath to persist TOFU (Trust on First Use) key bindings as JSON
--metrics-token(none)Bearer token for /healthz metrics. Can also be set via the NEFIA_DERP_METRICS_TOKEN environment variable
--metrics-port0 (disabled)Port for Prometheus metrics (binds to 127.0.0.1)
--trust-proxyfalseUse Fly-Client-IP / X-Forwarded-For headers for rate limiting. Enable when behind a reverse proxy
--versionDisplay version information and exit

Rate Limiting

The DERP server has IP-based rate limiting enabled by default:

ParameterValue
Request limit5 requests/second
Burst allowance10 requests
Response on exceededHTTP 429 (Too Many Requests)

Health Check

The DERP server provides a health check via the /healthz endpoint.

Without Authentication

bash
curl http://localhost:8443/healthz

Returns only a minimal status (ok / error).

With Authentication (Including Metrics)

bash
curl -H "Authorization: Bearer $METRICS_TOKEN" http://localhost:8443/healthz

Returns a response with detailed metrics including active client count, packet statistics, and memory usage.

Prometheus Monitoring

When the --metrics-port flag is specified, a Prometheus metrics endpoint is enabled at 127.0.0.1:<port>/metrics.

bash
nefia-derp --addr :8443 --metrics-port 9090

Example Prometheus scrape_configs configuration:

yaml
scrape_configs:
  - job_name: "nefia-derp"
    static_configs:
      - targets: ["localhost:9090"]

Key metrics:

MetricDescription
active_clientsNumber of currently connected clients
packets_routedTotal number of routed packets
packets_droppedNumber of dropped packets
total_bytesTotal bytes transferred

ACL (Access Control List)

You can restrict connections to only the WireGuard public keys listed in the file specified by --allowed-keys-file.

allowed-keys.txt Format

text
# Operator's public key
dGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIGtleQ==
 
# Agent: web-01
YW5vdGhlciBiYXNlNjQgZW5jb2RlZCBrZXk=
 
# Agent: db-01
eWV0IGFub3RoZXIgYmFzZTY0IGVuY29kZWQ=
  • One base64-encoded WireGuard public key per line
  • Lines starting with # are comments
  • Blank lines are ignored
  • Keys are 32-byte base64-encoded

To check the operator's public key:

bash
nefia vpn status

The agent's public key can be found in the vpn.public_key field within the host definition in nefia.yaml.

TLS Termination

nefia-derp itself does not perform TLS termination. Use one of the following methods to provide TLS:

  • Fly.io — Automatic TLS termination (no additional configuration needed)
  • Load Balancer — AWS ALB, GCP Load Balancer, Cloudflare, etc.
  • Reverse Proxy — nginx, Caddy, etc.

Example nginx reverse proxy configuration:

nginx
server {
    listen 443 ssl http2;
    server_name derp.example.com;
 
    ssl_certificate     /etc/letsencrypt/live/derp.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/derp.example.com/privkey.pem;
 
    location /derp {
        proxy_pass http://127.0.0.1:8443;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_read_timeout 86400s;
    }
 
    location /healthz {
        proxy_pass http://127.0.0.1:8443;
    }
}

Enable the --trust-proxy flag when using a reverse proxy.

Operator-Side Configuration

To use a deployed DERP server, add the DERP server URL to the operator's nefia.yaml:

yaml
vpn:
  derp_servers:
    - url: "wss://derp.example.com/derp"
      region: "ap-northeast-1"

Security Considerations

ItemCountermeasure
Communication encryptionTLS termination (load balancer or reverse proxy)
Unauthorized accessPublic key-based ACL via --allowed-keys-file
DoS preventionIP-based rate limiting (5 req/s, burst 10)
Slowloris attackReadHeaderTimeout: 10 seconds
Resource exhaustionConcurrent connection limit via --max-clients
Metrics protectionBearer token authentication via --metrics-token
Execution privilegesRuns in a distroless nonroot container

Troubleshooting

Clients cannot connect
  1. Check the DERP server health check:

    bash
    curl -v https://derp.example.com/healthz
  2. Verify that the firewall allows port 443 (when using TLS termination) or port 8443

  3. Verify that WebSocket upgrades are not being blocked by a proxy

  4. If using --allowed-keys-file, verify that the client's public key is included in the file

Hitting rate limits (HTTP 429)
  • If running behind a reverse proxy, verify that the --trust-proxy flag is enabled
  • If many agents are connecting from the same IP, this is expected behavior. If rate limit adjustments are needed, modify the constants in the source code
Connections are frequently dropped
  • Verify that the reverse proxy's proxy_read_timeout is set to a sufficiently long value
  • Verify that --ping-interval (default 30 seconds) is shorter than the proxy's idle timeout
  • If there are network quality issues, agents will automatically attempt to reconnect
Graceful shutdown behavior

When the DERP server receives SIGINT or SIGTERM:

  1. Stops accepting new connections
  2. Sends a StatusGoingAway WebSocket frame to active clients
  3. Clients automatically reconnect to the next available DERP server
  4. Maximum 30-second graceful shutdown period

To minimize downtime during rolling updates, it is recommended to deploy relays across multiple regions.

Next Steps

VPN Setup Guide

Configure WireGuard VPN tunnels, NAT traversal, and key rotation.

Troubleshooting

Diagnose and resolve connection errors and configuration issues.