HA Cluster
Build an active-passive high-availability cluster using Raft consensus.
Overview
Nefia's HA cluster provides an active-passive configuration based on the Raft consensus algorithm. The leader node handles VPN tunnels and state mutations, and standby nodes are automatically promoted if the leader fails.
Node 1 (Leader) Node 2 (Follower) Node 3 (Follower)
│ VPN tunnel │ standby │ standby
│ state mutations │ Raft replication │ Raft replication
│ │ │
└─ Raft TCP ───────┴─────────────────────┘Replicated State
| State | Description |
|---|---|
| Sessions | SSH session state |
| Queue | Task queue entries |
Only the leader node can Apply() state changes. Followers receive updates through the Raft log.
Setup
Prepare Configuration
Add cluster configuration to each node's nefia.yaml:
Node 1 (Initial Leader):
cluster:
enabled: true
node_id: "node-1"
bind_addr: "0.0.0.0:9700"
advertise_addr: "10.99.0.1:9700"Node 2:
cluster:
enabled: true
node_id: "node-2"
bind_addr: "0.0.0.0:9700"
advertise_addr: "10.99.0.2:9700"
peers:
- id: "node-1"
address: "10.99.0.1:9700"Initialize the Cluster
Bootstrap the cluster on the first node:
nefia cluster initThis creates a single-node cluster and makes this node the leader.
Start the Daemon
nefia daemonAdd Peers
Add other nodes from the leader node:
nefia cluster add-peer --id node-2 --addr 10.99.0.2:9700
nefia cluster add-peer --id node-3 --addr 10.99.0.3:9700Management Commands
Check Status
nefia cluster statusNode ID: node-1
State: Leader
Is Leader: true
Leader ID: node-1List Members
nefia cluster membersID ADDRESS STATE LOCAL
node-1 10.99.0.1:9700 Leader *
node-2 10.99.0.2:9700 Follower
node-3 10.99.0.3:9700 FollowerRemove a Peer
nefia cluster remove-peer --id node-3Configuration Reference
cluster:
enabled: true
node_id: "node-1" # Unique node ID
bind_addr: "0.0.0.0:9700" # Local TCP bind address
advertise_addr: "10.99.0.1:9700" # Address advertised to other nodes
data_dir: "" # Data directory (defaults to <state-dir>/cluster if empty)
peers:
- id: "node-2"
address: "10.99.0.2:9700"Leadership Transitions
When the leader fails, a follower is automatically promoted to leader after the Raft election timeout (1 second).
Callbacks on leadership change:
- Promoted to leader: Start VPN tunnels, start Reactor listeners
- Demoted to follower: Stop VPN tunnels, sync state from other nodes
Raft Parameters
| Parameter | Value | Description |
|---|---|---|
| Apply Timeout | 10s | Timeout for log application |
| Leader Lease | 500ms | Leader lease duration |
| Heartbeat Timeout | 1s | Heartbeat interval |
| Election Timeout | 1s | Election timeout |
| Snapshot Retain | 2 | Number of snapshots to retain |
| Connection Pool | 3 | Raft TCP connection pool size |
| Transport Timeout | 10s | Communication timeout |
Data Storage
| Data | Store | Location |
|---|---|---|
| Raft log | BoltDB | <data-dir>/raft.db |
| Snapshots | File-based | <data-dir>/ |
| Admin socket | Unix socket | <state-dir>/admin.sock |
Admin API
Cluster management is performed via the daemon process's Unix socket. CLI commands communicate with the daemon through this API.
Supported actions:
status— Get node statusmembers— Get member listadd_peer— Add a peer (leader only)remove_peer— Remove a peer (leader only)
mTLS Transport Security
Raft cluster transport supports optional mutual TLS (mTLS) for authenticating peer-to-peer communication. When enabled, all Raft traffic (log replication, heartbeats, snapshots) is encrypted and both peers must present valid certificates signed by a shared CA.
Configuration
cluster:
enabled: true
node_id: "node-1"
bind_addr: "0.0.0.0:9700"
advertise_addr: "10.99.0.1:9700"
tls_enabled: true
tls_cert_file: "/etc/nefia/cluster-cert.pem"
tls_key_file: "/etc/nefia/cluster-key.pem"
tls_ca_file: "/etc/nefia/cluster-ca.pem"| Field | Type | Description |
|---|---|---|
tls_enabled | bool | Enable mTLS for Raft inter-node transport |
tls_cert_file | string | Path to TLS certificate (PEM) |
tls_key_file | string | Path to TLS private key (PEM) |
tls_ca_file | string | Path to CA certificate for mutual TLS verification (PEM) |
All three file paths (tls_cert_file, tls_key_file, tls_ca_file) are required when tls_enabled is true.
Security Properties
- TLS 1.3 minimum: Only TLS 1.3 is accepted, ensuring modern cipher suites and key exchange.
- Mutual authentication: Both sides of the connection must present a certificate (
RequireAndVerifyClientCert). A node cannot join the cluster without a valid certificate signed by the shared CA. - EKU validation: Peer certificates must include both
serverAuthandclientAuthExtended Key Usage (EKU) values. Certificates withExtKeyUsageAnyare also accepted. - Forward secrecy: TLS session tickets are disabled to ensure every connection performs a full handshake, preventing session resumption attacks.
MCP Tools
AI agents can manage the Raft cluster programmatically via MCP:
| Tool | Description |
|---|---|
nefia.cluster.init | Initialize a new Raft cluster on this node. |
nefia.cluster.status | Show cluster status including leader, term, and commit index. |
nefia.cluster.members | List all cluster members and their state (leader, follower, candidate). |
nefia.cluster.add_peer | Add a new peer node to the cluster by ID and address. |
nefia.cluster.remove_peer | Remove a peer node from the cluster. |