Agent Patterns
Best practices for AI agents using Nefia MCP tools.
This guide covers patterns and best practices for AI agents integrating with Nefia through the MCP server. It assumes you have already connected your AI agent to the MCP server (see MCP Integration).
Session Lifecycle Management
Sessions provide a sandboxed, persistent context for file operations on a specific host. The standard pattern is: open a session, execute operations, then close it.
Single-Host Session
// Step 1: Open a session
{ "method": "tools/call", "params": { "name": "nefia.session.open", "arguments": { "host_id": "web-prod-1", "root": "/var/www/myapp" } } }
// Response
{ "content": [{ "type": "text", "text": "{\"session_id\":\"s-abc123\",\"resolved_root\":\"/var/www/myapp\",\"resolved_cwd\":\"/var/www/myapp\"}" }] }
// Step 2: Use the session for file/exec operations
{ "method": "tools/call", "params": { "name": "nefia.fs.read", "arguments": { "session_id": "s-abc123", "path": "config.yaml" } } }
// Step 3: Close the session when done
{ "method": "tools/call", "params": { "name": "nefia.session.close", "arguments": { "session_id": "s-abc123" } } }Multi-Host Sessions
AI agents can maintain multiple sessions simultaneously. Use nefia.session.list to track active sessions:
// Open sessions on two hosts
{ "method": "tools/call", "params": { "name": "nefia.session.open", "arguments": { "host_id": "web-1", "root": "/etc/nginx" } } }
{ "method": "tools/call", "params": { "name": "nefia.session.open", "arguments": { "host_id": "web-2", "root": "/etc/nginx" } } }
// Check active sessions
{ "method": "tools/call", "params": { "name": "nefia.session.list" } }
// Close all when done
{ "method": "tools/call", "params": { "name": "nefia.session.close", "arguments": { "session_id": "s-abc123" } } }
{ "method": "tools/call", "params": { "name": "nefia.session.close", "arguments": { "session_id": "s-def456" } } }Session vs. Target
Choose the right approach based on your use case:
- Session (
session_id): Use when performing multiple operations on the same host (file reads, edits, exec). Operations are scoped to the session root directory. - Target (
target): Use for one-off operations across multiple hosts (deploy, status checks, bulk commands). No persistent context is created.
Error Recovery
Nefia MCP tools return structured errors with remediation guidance. AI agents should follow this decision tree:
Check the error code. Every domain error includes an error_code string (e.g., HOST_NOT_FOUND, POLICY_DENIED, SSH_AUTH_FAILED).
{
"content": [{
"type": "text",
"text": "{\"error\":\"host not found: staging-3\",\"code\":-32008,\"details\":{\"error_code\":\"HOST_NOT_FOUND\",\"hint\":\"host not found: staging-3\",\"suggested_actions\":[\"Call nefia.hosts.list to see available hosts\",\"Check host ID spelling\"]}}"
}],
"isError": true
}Call nefia.explain for detailed remediation. Pass the error code to get diagnostics commands and example tool calls:
{ "method": "tools/call", "params": { "name": "nefia.explain", "arguments": { "error_code": "HOST_NOT_FOUND" } } }Response:
{
"error_code": "HOST_NOT_FOUND",
"known": true,
"suggested_actions": ["Call nefia.hosts.list to see available hosts", "Check host ID spelling"],
"diagnostics": ["nefia.hosts.list"],
"example": { "tool": "nefia.hosts.list", "arguments": {} }
}Follow the suggested actions. Execute the diagnostics tools and adjust your approach based on the results.
Error Codes and Retry Behavior
| Error Code | Retryable | Recommended Action |
|---|---|---|
POLICY_DENIED | No | Explain to the user why it was blocked. Do not retry. |
HOST_NOT_FOUND | No | Call nefia.hosts.list to find the correct host ID. |
SSH_AUTH_FAILED | No | Check SSH key configuration. Report to the user. |
SSH_CONN_FAILED | Yes | Check VPN status with nefia.vpn.diagnose. Retry after connectivity is restored. |
CMD_TIMEOUT | Yes | Increase timeout_ms or break the command into smaller steps. |
SFTP_FAILED | Depends | Check hint for specific cause (file not found vs. permission denied). |
RATE_LIMITED | Yes | Wait for retry_after_ms then retry. |
SESSION_NOT_FOUND | No | The session expired or was closed. Open a new one. |
Multi-Host Orchestration
Using Target Selectors
Target selectors let you address hosts by ID, group, tag, or all at once. Always discover hosts first:
// Discover available hosts
{ "method": "tools/call", "params": { "name": "nefia.hosts.list" } }
// Target by host ID
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": { "target": "host:web-1", "command": "uptime" } } }
// Target by group
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": { "target": "group:production", "command": "df -h" } } }
// Target by tag
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": { "target": "tag:env=staging", "command": "systemctl status nginx" } } }
// Target all hosts
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": { "target": "all", "command": "hostname" } } }Batch Execution
For large fleets, use batch execution to roll out changes gradually:
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
"target": "group:production",
"command": "systemctl restart myapp",
"batch_size": 5,
"batch_wait": "30s"
} } }This restarts the service on 5 hosts at a time, waiting 30 seconds between batches.
Subset Testing
Test on a random subset before deploying to all hosts:
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
"target": "group:production",
"command": "myapp --version",
"subset": 3
} } }Rerunning Failed Hosts
After a batch execution, rerun only the failed hosts:
// The initial exec returns a job_id in the summary
// { "summary": { "ok": false, "success_count": 8, "fail_count": 2, "job_id": "j-xyz789" } }
// Rerun on failed hosts only
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
"target": "group:production",
"command": "systemctl restart myapp",
"rerun": "failure",
"last_job_id": "j-xyz789"
} } }Dry-Run and Execute Pattern
Always preview destructive operations before executing them. This two-step pattern provides safety for multi-host operations.
Preview with dry_run. The dry-run resolves targets and checks policy without executing:
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
"target": "tag:env=prod",
"command": "systemctl restart nginx",
"dry_run": true
} } }Response shows which hosts would be affected and whether the operation is allowed:
{
"dry_run": true,
"hosts": [
{ "host_id": "web-1", "allowed": true },
{ "host_id": "web-2", "allowed": true },
{ "host_id": "db-1", "allowed": false, "reason": "policy denied: deny_commands matched" }
]
}Execute after review. Remove dry_run to execute:
{ "method": "tools/call", "params": { "name": "nefia.exec", "arguments": {
"target": "tag:env=prod",
"command": "systemctl restart nginx"
} } }The dry-run pattern also works with nefia.push, nefia.sync, and nefia.playbook.run.
Approval Workflow
When the approval workflow is enabled, destructive operations may block waiting for human approval. AI agents should handle this gracefully.
How Approval Works
- The AI agent calls a destructive tool (e.g.,
nefia.execwithrmornefia.sys.service.controlwithrestart). - If an approval rule matches, the server sends a
notifications/messagewithtype: "approval_required". - The tool call blocks until a human approves or denies the request via the TUI dashboard.
- If the timeout expires (default: 30 seconds), the configured
default_actionapplies (default:deny).
Polling for Approval Status
AI agents can check pending approvals using nefia.approval.list:
{ "method": "tools/call", "params": { "name": "nefia.approval.list" } }Response:
{
"pending": [
{ "id": "apr-001", "tool": "nefia.exec", "command": "systemctl restart nginx", "host_id": "web-1", "created_at": "2026-03-18T10:00:00Z" }
],
"enabled": true
}Recommended Approval Pattern
- Call the destructive tool.
- If the response is delayed (approval is pending), inform the user that human approval is required.
- Periodically poll
nefia.approval.listto check status. - Once approved, the original tool call completes and returns the result.
Playbook Discovery
Playbooks define reusable multi-step workflows. The discovery pattern lets agents find and validate playbooks before running them.
List available playbooks:
{ "method": "tools/call", "params": { "name": "nefia.playbook.list" } }Response:
{
"playbooks": [
{ "name": "deploy", "path": "./playbooks/deploy.yaml", "description": "Deploy application", "step_count": 5 },
{ "name": "health-check", "path": "./playbooks/health-check.yaml", "description": "Run health checks", "step_count": 3 }
],
"warnings": []
}Inspect a playbook's details:
{ "method": "tools/call", "params": { "name": "nefia.playbook.show", "arguments": { "name_or_path": "deploy" } } }Validate before running:
{ "method": "tools/call", "params": { "name": "nefia.playbook.validate", "arguments": { "name_or_path": "deploy" } } }Response:
{ "valid": true, "errors": [], "warnings": [] }Run the playbook (with optional dry-run first):
{ "method": "tools/call", "params": { "name": "nefia.playbook.run", "arguments": {
"target": "tag:env=staging",
"playbook": { "name": "deploy", "steps": [] },
"dry_run": true
} } }System Diagnostics
Use diagnostic tools to understand the current state before taking action.
Quick Status Check
Get a system-wide overview in a single call:
{ "method": "tools/call", "params": { "name": "nefia.status" } }Response:
{
"vpn": { "ready": true, "peer_count": 12, "healthy_peers": 11 },
"hosts": { "total": 12, "online": 11, "offline": 1 },
"queue": { "pending": 2, "running": 0, "completed": 15, "failed": 1 },
"sessions": { "active": 3 },
"config": { "policy_mode": "enforce", "vpn_enabled": true }
}Health Checks
Run comprehensive diagnostics when something seems wrong:
{ "method": "tools/call", "params": { "name": "nefia.doctor", "arguments": { "host_id": "web-1" } } }Recommended Diagnostic Flow
- Call
nefia.statusfor a quick overview. - If hosts are offline, call
nefia.vpn.diagnoseto check connectivity. - For specific hosts, call
nefia.vpn.pingto test reachability. - Use
nefia.doctorwithhost_idfor targeted diagnostics. - Check
nefia.queue.listfor commands queued for offline hosts.
Related
Connect AI agents to your infrastructure using the Model Context Protocol.
Technical reference for all MCP tools, error codes, and protocol details.
Learn how sessions provide sandboxed, persistent file operation contexts.
Define and execute multi-step workflows across your infrastructure.