DERP Relay Server Deployment
Deploy and operate a self-hosted DERP relay server with Docker, Fly.io, Prometheus monitoring, and ACL configuration.
Overview
DERP (Designated Encrypted Relay for Packets) is a relay server that forwards WireGuard packets over WebSocket in environments where direct UDP connections cannot be established (Symmetric NAT, CGNAT, strict firewalls).
Nefia uses the public relay at relay.nefia.ai by default, but self-hosting a DERP server is recommended in the following cases:
- Latency optimization — Place the relay geographically close to your users
- Privacy requirements — Ensure traffic only passes through your own infrastructure
- Availability requirements — Eliminate dependency on the public relay
- ACL control — Allow connections only from authorized WireGuard public keys
Agent ──── wss:// ───→ DERP Relay ←── wss:// ──── Operator
(WebSocket relay, encrypted WireGuard packets)Docker Deployment
Building the Image
Build the image using the deploy/derp/Dockerfile included in the repository:
docker build -t nefia-derp:latest -f deploy/derp/Dockerfile .This multi-stage build:
- Statically compiles
nefia-derpwith Go 1.25 - Deploys onto a distroless (nonroot) base image
- Exposes port 8443
Basic Startup
docker run -d \
--name nefia-derp \
-p 8443:8443 \
-e NEFIA_DERP_METRICS_TOKEN="$(openssl rand -base64 32)" \
nefia-derp:latestdocker-compose.yml
Example configuration for production environments:
services:
nefia-derp:
image: nefia-derp:latest
build:
context: .
dockerfile: deploy/derp/Dockerfile
ports:
- "8443:8443"
- "127.0.0.1:9090:9090" # Prometheus metrics (local only)
environment:
NEFIA_DERP_METRICS_TOKEN: "${DERP_METRICS_TOKEN}"
command:
- "--addr=:8443"
- "--max-clients=10000"
- "--metrics-port=9090"
- "--allowed-keys-file=/etc/nefia/allowed-keys.txt"
volumes:
- ./allowed-keys.txt:/etc/nefia/allowed-keys.txt:ro
restart: unless-stopped
deploy:
resources:
limits:
memory: 256M
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8443/healthz"]
interval: 15s
timeout: 5s
retries: 3Fly.io Deployment
You can deploy to Fly.io using the deploy/derp/fly.toml included in the repository.
Create the App
fly apps create nefia-derp --machinesSet Secrets
fly secrets set NEFIA_DERP_METRICS_TOKEN="$(openssl rand -base64 32)" --app nefia-derpDeploy
fly deploy --config deploy/derp/fly.tomlScale to Regions
# Tokyo (default primary_region)
# US East
fly scale count 1 --region iad --app nefia-derp
# EU West
fly scale count 1 --region cdg --app nefia-derp
# Southeast Asia
fly scale count 1 --region sin --app nefia-derpKey Fly.io configuration points:
| Setting | Value | Description |
|---|---|---|
| VM Memory | 256MB | Lightweight relay server |
| CPU | shared x1 | Low-cost operation |
force_https | true | Enforce TLS |
auto_stop_machines | false | Always running |
min_machines_running | 1 | Ensure at least 1 instance |
| Concurrent connection hard limit | 10,000 | soft_limit: 8,000 |
Configuration Flags
| Flag | Default | Description |
|---|---|---|
--addr | :8443 | Listen address for WebSocket connections |
--max-clients | 10000 | Maximum number of concurrent client connections |
--ping-interval | 30s | Keepalive ping interval |
--allowed-keys-file | (none) | File path of allowed WireGuard public keys (requires --open-relay if omitted) |
--open-relay | false | Explicitly allow connections from any WireGuard key when no --allowed-keys-file is specified. Not recommended for production |
--bindings-file | <data-dir>/tofu-bindings.json | Path to persist TOFU (Trust on First Use) key bindings as JSON |
--metrics-token | (none) | Bearer token for /healthz metrics. Can also be set via the NEFIA_DERP_METRICS_TOKEN environment variable |
--metrics-port | 0 (disabled) | Port for Prometheus metrics (binds to 127.0.0.1) |
--trust-proxy | false | Use Fly-Client-IP / X-Forwarded-For headers for rate limiting. Enable when behind a reverse proxy |
--version | — | Display version information and exit |
Rate Limiting
The DERP server has IP-based rate limiting enabled by default:
| Parameter | Value |
|---|---|
| Request limit | 5 requests/second |
| Burst allowance | 10 requests |
| Response on exceeded | HTTP 429 (Too Many Requests) |
Health Check
The DERP server provides a health check via the /healthz endpoint.
Without Authentication
curl http://localhost:8443/healthzReturns only a minimal status (ok / error).
With Authentication (Including Metrics)
curl -H "Authorization: Bearer $METRICS_TOKEN" http://localhost:8443/healthzReturns a response with detailed metrics including active client count, packet statistics, and memory usage.
Prometheus Monitoring
When the --metrics-port flag is specified, a Prometheus metrics endpoint is enabled at 127.0.0.1:<port>/metrics.
nefia-derp --addr :8443 --metrics-port 9090Example Prometheus scrape_configs configuration:
scrape_configs:
- job_name: "nefia-derp"
static_configs:
- targets: ["localhost:9090"]Key metrics:
| Metric | Description |
|---|---|
active_clients | Number of currently connected clients |
packets_routed | Total number of routed packets |
packets_dropped | Number of dropped packets |
total_bytes | Total bytes transferred |
ACL (Access Control List)
You can restrict connections to only the WireGuard public keys listed in the file specified by --allowed-keys-file.
allowed-keys.txt Format
# Operator's public key
dGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIGtleQ==
# Agent: web-01
YW5vdGhlciBiYXNlNjQgZW5jb2RlZCBrZXk=
# Agent: db-01
eWV0IGFub3RoZXIgYmFzZTY0IGVuY29kZWQ=- One base64-encoded WireGuard public key per line
- Lines starting with
#are comments - Blank lines are ignored
- Keys are 32-byte base64-encoded
To check the operator's public key:
nefia vpn statusThe agent's public key can be found in the vpn.public_key field within the host definition in nefia.yaml.
TLS Termination
nefia-derp itself does not perform TLS termination. Use one of the following methods to provide TLS:
- Fly.io — Automatic TLS termination (no additional configuration needed)
- Load Balancer — AWS ALB, GCP Load Balancer, Cloudflare, etc.
- Reverse Proxy — nginx, Caddy, etc.
Example nginx reverse proxy configuration:
server {
listen 443 ssl http2;
server_name derp.example.com;
ssl_certificate /etc/letsencrypt/live/derp.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/derp.example.com/privkey.pem;
location /derp {
proxy_pass http://127.0.0.1:8443;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 86400s;
}
location /healthz {
proxy_pass http://127.0.0.1:8443;
}
}Enable the --trust-proxy flag when using a reverse proxy.
Operator-Side Configuration
To use a deployed DERP server, add the DERP server URL to the operator's nefia.yaml:
vpn:
derp_servers:
- url: "wss://derp.example.com/derp"
region: "ap-northeast-1"Security Considerations
| Item | Countermeasure |
|---|---|
| Communication encryption | TLS termination (load balancer or reverse proxy) |
| Unauthorized access | Public key-based ACL via --allowed-keys-file |
| DoS prevention | IP-based rate limiting (5 req/s, burst 10) |
| Slowloris attack | ReadHeaderTimeout: 10 seconds |
| Resource exhaustion | Concurrent connection limit via --max-clients |
| Metrics protection | Bearer token authentication via --metrics-token |
| Execution privileges | Runs in a distroless nonroot container |
Troubleshooting
Clients cannot connect
-
Check the DERP server health check:
bashcurl -v https://derp.example.com/healthz -
Verify that the firewall allows port 443 (when using TLS termination) or port 8443
-
Verify that WebSocket upgrades are not being blocked by a proxy
-
If using
--allowed-keys-file, verify that the client's public key is included in the file
Hitting rate limits (HTTP 429)
- If running behind a reverse proxy, verify that the
--trust-proxyflag is enabled - If many agents are connecting from the same IP, this is expected behavior. If rate limit adjustments are needed, modify the constants in the source code
Connections are frequently dropped
- Verify that the reverse proxy's
proxy_read_timeoutis set to a sufficiently long value - Verify that
--ping-interval(default 30 seconds) is shorter than the proxy's idle timeout - If there are network quality issues, agents will automatically attempt to reconnect
Graceful shutdown behavior
When the DERP server receives SIGINT or SIGTERM:
- Stops accepting new connections
- Sends a
StatusGoingAwayWebSocket frame to active clients - Clients automatically reconnect to the next available DERP server
- Maximum 30-second graceful shutdown period
To minimize downtime during rolling updates, it is recommended to deploy relays across multiple regions.
Next Steps
Configure WireGuard VPN tunnels, NAT traversal, and key rotation.
Diagnose and resolve connection errors and configuration issues.