Skip to content

Observability & Telemetry

Configure OpenTelemetry tracing, Prometheus metrics, and SIEM integration for production monitoring.

Overview

Nefia provides built-in observability through three complementary pillars:

  • OpenTelemetry Tracing — Distributed trace spans for every command execution, SSH session, and file operation
  • Prometheus Metrics — Real-time counters and gauges exposed via an HTTP /metrics endpoint
  • SIEM Log Forwarding — Append-only audit logs forwarded to Splunk, Datadog, or generic webhooks

Together these give operators full visibility into what Nefia is doing across their fleet.

OpenTelemetry Tracing

Enable tracing in nefia.yaml:

yaml
telemetry:
  enabled: true
  endpoint: "localhost:4318"
  service_name: "nefia"
FieldDefaultDescription
enabledfalseEnable or disable trace export
endpointlocalhost:4318OTLP HTTP receiver address
service_namenefiaService name attached to all spans

The endpoint uses the OTLP HTTP protocol (port 4318 by default). Nefia exports spans asynchronously in batches, so tracing adds negligible overhead to normal operations.

Compatible Backends

Nefia connects to the OTLP endpoint over plain HTTP (no TLS, no custom headers). This works with any local or self-hosted collector that accepts unencrypted OTLP HTTP:

  • Jaeger (all-in-one or collector)
  • Grafana Tempo
  • Datadog Agent (local OTLP receiver)
  • OpenTelemetry Collector

Connecting to Backends

Jaeger

Run Jaeger locally with OTLP ingestion enabled:

bash
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

Then set the endpoint in nefia.yaml:

yaml
telemetry:
  enabled: true
  endpoint: "localhost:4318"

Open http://localhost:16686 to view traces in the Jaeger UI.

Grafana Tempo

Point Nefia at your Tempo instance:

yaml
telemetry:
  enabled: true
  endpoint: "tempo:4318"

Tempo stores traces and integrates with Grafana for visualization. Refer to the Grafana Tempo documentation for deployment instructions.

Datadog

The Datadog Agent can receive OTLP traces on port 4318. Enable the OTLP receiver in your Agent configuration:

bash
DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT=0.0.0.0:4318

Then configure Nefia to send traces to the Datadog Agent:

yaml
telemetry:
  enabled: true
  endpoint: "localhost:4318"

Traces appear in the Datadog APM section under the service name you configured.

Prometheus Metrics

Enable the built-in metrics endpoint in nefia.yaml:

yaml
metrics:
  enabled: true
  listen: ":9090"
FieldDefaultDescription
enabledfalseEnable or disable the metrics HTTP server
listen:9090Address and port for the /metrics endpoint

Once enabled, scrape http://<host>:9090/metrics from Prometheus or any compatible collector.

Key Metrics

MetricTypeDescription
nefia_commands_totalCounterTotal commands executed (labels: host, status)
nefia_ssh_connections_activeGaugeCurrently active SSH connections
nefia_vpn_tunnel_statusGaugeVPN tunnel state per host (1 = up, 0 = down)

SIEM Integration

Nefia's audit logger writes append-only JSONL files that can be forwarded in real time to external SIEM platforms. Three forwarder types are supported:

  • Splunk — HTTP Event Collector (HEC)
  • Datadog — Log Intake API v2
  • Webhook — Generic HTTP endpoint with HMAC-SHA256 signing

For full configuration details, see the SIEM Integration guide.

Audit Log Format

Each audit event is a single JSON line containing the timestamp, user, host, command, and outcome. This format is compatible with most log aggregation pipelines without additional parsing.

Troubleshooting

No spans appearing in the backend
  1. Verify that telemetry.enabled is set to true in nefia.yaml.
  2. Confirm the telemetry.endpoint value matches your collector's OTLP HTTP address.
  3. Check that the collector is actually running and listening on the expected port:
    bash
    curl -s http://localhost:4318/v1/traces
    A running OTLP collector will respond (even if with an error for an empty request) rather than refusing the connection.
  4. Review Nefia's log output for the telemetry initialized message at startup. If it is missing, the configuration may not be loaded correctly.
Connection refused on the OTLP endpoint

This typically means the collector process is not running or is bound to a different address.

  • If using Docker, ensure the container port is published (-p 4318:4318).
  • If using the Datadog Agent, verify DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT is set.
  • If behind a firewall, confirm port 4318 is open for TCP traffic.
Metrics endpoint returns 404

Ensure metrics.enabled is true and the listen address is correct. The metrics are served at the /metrics path — not the root path.

bash
curl http://localhost:9090/metrics