Agent Patterns

Best practices for AI agents using Nefia MCP tools.

This guide covers patterns and best practices for AI agents integrating with Nefia through the MCP server. It assumes you have already connected your AI agent to the MCP server (see MCP Integration).

Session Lifecycle Management

Sessions provide a sandboxed, persistent context for file operations on a specific host. The standard pattern is: open a session, execute operations, then close it.

Single-Host Session

json

// Step 1: Open a session
{ "method": "tools/call", "params": { "name": "nefia.session.open", "arguments": { "host_id": "web-prod-1", "root": "/var/www/myapp" } } }
 
// Response
{ "content": [{ "type": "text", "text": "{\"session_id\":\"s-abc123\",\"resolved_root\":\"/var/www/myapp\",\"resolved_cwd\":\"/var/www/myapp\"}" }] }
 
// Step 2: Use the session for file/exec operations
{ "method": "tools/call", "params": { "name": "nefia.fs.read", "arguments": { "session_id": "s-abc123", "path": "config.yaml" } } }
 
// Step 3: Close the session when done
{ "method": "tools/call", "params": { "name": "nefia.session.close", "arguments": { "session_id": "s-abc123" } } }

Multi-Host Sessions

AI agents can maintain multiple sessions simultaneously. Use nefia.session.list to track active sessions:

json

// Open sessions on two hosts
{ "method": "tools/call", "params": { "name": "nefia.session.open", "arguments": { "host_id": "web-1", "root": "/etc/nginx" } } }
{ "method": "tools/call", "params": { "name": "nefia.session.open", "arguments": { "host_id": "web-2", "root": "/etc/nginx" } } }
 
// Check active sessions
{ "method": "tools/call", "params": { "name": "nefia.session.list" } }
 
// Close all when done
{ "method": "tools/call", "params": { "name": "nefia.session.close", "arguments": { "session_id": "s-abc123" } } }
{ "method": "tools/call", "params": { "name": "nefia.session.close", "arguments": { "session_id": "s-def456" } } }

Session vs. Target

Choose the right approach based on your use case:

Session (session_id): Use when performing multiple operations on the same host (file reads, edits, exec). Operations are scoped to the session root directory.
Target (target): Use for one-off operations across multiple hosts (deploy, status checks, bulk commands). No persistent context is created.

Error Recovery

Nefia MCP tools return structured errors with remediation guidance. AI agents should follow this decision tree:

Check the error code. Every domain error includes an error_code string (e.g., HOST_NOT_FOUND, POLICY_DENIED, SSH_AUTH_FAILED).

json

{
  "content": [{
    "type": "text",
    "text": "{\"error\":\"host not found: staging-3\",\"code\":-32008,\"details\":{\"error_code\":\"HOST_NOT_FOUND\",\"hint\":\"host not found: staging-3\",\"suggested_actions\":[\"Call nefia.hosts.list to see available hosts\",\"Check host ID spelling\"]}}"
  }],
  "isError": true
}

Call nefia.explain for detailed remediation. Pass the error code to get diagnostics commands and example tool calls:

json

{ "method": "tools/call", "params": { "name": "nefia.explain", "arguments": { "error_code": "HOST_NOT_FOUND" } } }

Response:

json

{
  "error_code": "HOST_NOT_FOUND",
  "known": true,
  "suggested_actions": ["Call nefia.hosts.list to see available hosts", "Check host ID spelling"],
  "diagnostics": ["nefia.hosts.list"],
  "example": { "tool": "nefia.hosts.list", "arguments": {} }
}

Follow the suggested actions. Execute the diagnostics tools and adjust your approach based on the results.

Error Codes and Retry Behavior

Error Code	Retryable	Recommended Action
`POLICY_DENIED`	No	Explain to the user why it was blocked. Do not retry.
`HOST_NOT_FOUND`	No	Call `nefia.hosts.list` to find the correct host ID.
`SSH_AUTH_FAILED`	No	Check SSH key configuration. Report to the user.
`SSH_CONN_FAILED`	Yes	Check VPN status with `nefia.vpn.diagnose`. Retry after connectivity is restored.
`CMD_TIMEOUT`	Yes	Increase `timeout_ms` or break the command into smaller steps.
`SFTP_FAILED`	Depends	Check `hint` for specific cause (file not found vs. permission denied).
`RATE_LIMITED`	Yes	Wait for `retry_after_ms` then retry.
`SESSION_NOT_FOUND`	No	The session expired or was closed. Open a new one.

Multi-Host Orchestration

Using Target Selectors

Target selectors let you address hosts by ID, group, tag, or all at once. Always discover hosts first:

json

// Discover available hosts
{ "method": "tools/call", "params": { "name": "nefia.hosts.list" } }
 
// Target by host ID
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": { "target": "host:web-1", "command": "uptime" } } }
 
// Target by group
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": { "target": "group:production", "command": "df -h" } } }
 
// Target by tag
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": { "target": "tag:env=staging", "command": "systemctl status nginx" } } }
 
// Target all hosts
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": { "target": "all", "command": "hostname" } } }

Batch Execution

For large fleets, use batch execution to roll out changes gradually:

json

{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
  "target": "group:production",
  "command": "systemctl restart myapp",
  "batch_size": 5,
  "batch_wait": "30s"
} } }

This restarts the service on 5 hosts at a time, waiting 30 seconds between batches.

Subset Testing

Test on a random subset before deploying to all hosts:

json

{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
  "target": "group:production",
  "command": "myapp --version",
  "subset": 3
} } }

Rerunning Failed Hosts

After a batch execution, rerun only the failed hosts:

json

// The initial exec returns a job_id in the summary
// { "summary": { "ok": false, "success_count": 8, "fail_count": 2, "job_id": "j-xyz789" } }
 
// Rerun on failed hosts only
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
  "target": "group:production",
  "command": "systemctl restart myapp",
  "rerun": "failure",
  "last_job_id": "j-xyz789"
} } }

Dry-Run and Execute Pattern

Always preview destructive operations before executing them. This two-step pattern provides safety for multi-host operations.

Preview with dry_run. The dry-run resolves targets and checks policy without executing:

json

{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
  "target": "tag:env=prod",
  "command": "systemctl restart nginx",
  "dry_run": true
} } }

Response shows which hosts would be affected and whether the operation is allowed:

json

{
  "dry_run": true,
  "hosts": [
    { "host_id": "web-1", "allowed": true },
    { "host_id": "web-2", "allowed": true },
    { "host_id": "db-1", "allowed": false, "reason": "policy denied: deny_commands matched" }
  ]
}

Execute after review. Remove dry_run to execute:

json

{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
  "target": "tag:env=prod",
  "command": "systemctl restart nginx"
} } }

The dry-run pattern also works with nefia.push, nefia.sync, and nefia.playbook.run.

Approval Workflow

When the approval workflow is enabled, destructive operations may block waiting for human approval. AI agents should handle this gracefully.

How Approval Works

The AI agent calls a destructive tool (e.g., nefia.exec with rm or nefia.sys.service.control with restart).
If an approval rule matches, the server sends a notifications/message with type: "approval_required".
The tool call blocks until a human approves or denies the request via the TUI dashboard.
If the timeout expires (default: 30 seconds), the configured default_action applies (default: deny).

Polling for Approval Status

AI agents can check pending approvals using nefia.approval.list:

json

{ "method": "tools/call", "params": { "name": "nefia.approval.list" } }

Response:

json

{
  "pending": [
    { "id": "apr-001", "tool": "nefia.exec", "command": "systemctl restart nginx", "host_id": "web-1", "created_at": "2026-03-18T10:00:00Z" }
  ],
  "enabled": true
}

Recommended Approval Pattern

Call the destructive tool.
If the response is delayed (approval is pending), inform the user that human approval is required.
Periodically poll nefia.approval.list to check status.
Once approved, the original tool call completes and returns the result.

Playbook Discovery

Playbooks define reusable multi-step workflows. The discovery pattern lets agents find and validate playbooks before running them.

List available playbooks:

json

{ "method": "tools/call", "params": { "name": "nefia.playbook.list" } }

Response:

json

{
  "playbooks": [
    { "name": "deploy", "path": "./playbooks/deploy.yaml", "description": "Deploy application", "step_count": 5 },
    { "name": "health-check", "path": "./playbooks/health-check.yaml", "description": "Run health checks", "step_count": 3 }
  ],
  "warnings": []
}

Inspect a playbook's details:

json

{ "method": "tools/call", "params": { "name": "nefia.playbook.show", "arguments": { "name_or_path": "deploy" } } }

Validate before running:

json

{ "method": "tools/call", "params": { "name": "nefia.playbook.validate", "arguments": { "name_or_path": "deploy" } } }

Response:

json

{ "valid": true, "errors": [], "warnings": [] }

Run the playbook (with optional dry-run first):

json

{ "method": "tools/call", "params": { "name": "nefia.playbook.run", "arguments": {
  "target": "tag:env=staging",
  "playbook": { "name": "deploy", "steps": [] },
  "dry_run": true
} } }

System Diagnostics

Use diagnostic tools to understand the current state before taking action.

Quick Status Check

Get a system-wide overview in a single call:

json

{ "method": "tools/call", "params": { "name": "nefia.status" } }

Response:

json

{
  "vpn": { "ready": true, "peer_count": 12, "healthy_peers": 11 },
  "hosts": { "total": 12, "online": 11, "offline": 1 },
  "queue": { "pending": 2, "running": 0, "completed": 15, "failed": 1 },
  "sessions": { "active": 3 },
  "config": { "policy_mode": "enforce", "vpn_enabled": true }
}

Health Checks

Run comprehensive diagnostics when something seems wrong:

json

{ "method": "tools/call", "params": { "name": "nefia.doctor", "arguments": { "host_id": "web-1" } } }

Recommended Diagnostic Flow

Call nefia.status for a quick overview.
If hosts are offline, call nefia.vpn.diagnose to check connectivity.
For specific hosts, call nefia.vpn.ping to test reachability.
Use nefia.doctor with host_id for targeted diagnostics.
Check nefia.queue.list for commands queued for offline hosts.

MCP Integration

Connect AI agents to your infrastructure using the Model Context Protocol.

MCP Protocol Reference

Technical reference for all MCP tools, error codes, and protocol details.

Sessions

Learn how sessions provide sandboxed, persistent file operation contexts.

Playbooks

Define and execute multi-step workflows across your infrastructure.