Getting Started
Building
cargo build --release
Two binaries are produced:
./target/release/intendant-runtime— the command runtime./target/release/intendant— the AI CLI/TUI/Web
Installing
cargo install --path .
Both binaries are installed to ~/.cargo/bin/. The intendant binary embeds default system prompts and web assets (HTML, WASM) at compile time, so it works immediately from any directory without needing the source tree.
Prerequisites
- Rust toolchain (stable)
- wasm-pack —
cargo install wasm-pack(auto-rebuilds WASM on source changes) - ffmpeg — required for display recording (
brew install ffmpeg/apt install ffmpeg) - macOS:
./scripts/setup-macos.shinstalls all platform dependencies - Linux:
sudo apt install imagemagick xdotool xvfb x11vnc ffmpeg
WASM auto-rebuild
The build.rs script automatically rebuilds WASM when crates/presence-web/ or crates/presence-core/ source files change. This requires wasm-pack to be installed. If not installed, cargo build prints a warning and skips the WASM rebuild.
To rebuild manually:
cd crates/presence-web && wasm-pack build --target web --out-dir ../../static/wasm-web --out-name presence_web
cargo build --release -p intendant # Re-embed WASM
Setup
Create a .env file or export the variables. The caller searches for .env in this order:
- Current directory (and parent directories)
- Project root (git root)
- Global config (
~/.config/intendant/.env)
For global use after cargo install, put your keys in ~/.config/intendant/.env:
# OpenAI
OPENAI_API_KEY=sk-...
# Or Anthropic
ANTHROPIC_API_KEY=sk-ant-...
# Or Gemini (Google AI)
GEMINI_API_KEY=AI...
# If multiple keys are set, choose one:
PROVIDER=openai # or "anthropic" or "gemini"
MODEL_NAME=gpt-5.2-codex # optional, provider-specific default used if omitted
# Disable native tool calling (fall back to text-based JSON extraction)
# USE_NATIVE_TOOLS=false
Running
# With a task as CLI argument (launches TUI)
./target/release/intendant "List the files in /tmp"
# Headless mode (no TUI, plain text output)
./target/release/intendant --no-tui "List the files in /tmp"
# With autonomy level
./target/release/intendant --autonomy low "rm -rf /tmp/test"
# Specify provider and model
./target/release/intendant --provider anthropic --model claude-sonnet-4-5-20250929 "List files"
# Use Gemini provider
./target/release/intendant --provider gemini --model gemini-2.5-pro "List files"
# Interactive mode (prompts for task on stdin)
./target/release/intendant
# Verbose output (show debug-level log entries)
./target/release/intendant --verbose "echo hello"
# JSONL structured output (implies --no-tui)
./target/release/intendant --json "echo hello"
# Resume most recent session for this project
./target/release/intendant --continue "fix that bug"
# Resume specific session by ID or prefix
./target/release/intendant --resume abc123 "continue"
# Force single-agent mode (skip orchestrator)
./target/release/intendant --direct "simple task"
# Web dashboard (Activity + Usage + Terminal + Displays, default port 8765)
./target/release/intendant --web
# Web dashboard on custom port
./target/release/intendant --web 9000
# Enable filesystem sandboxing (Landlock, Linux 5.13+)
./target/release/intendant --sandbox "run tests"
# Run as MCP server (stdio transport)
./target/release/intendant --mcp "Deploy the application"
# Enable Unix control socket
./target/release/intendant --control-socket "task"
# Disable the presence layer
./target/release/intendant --no-presence "task"
# Pipe input (auto-detects non-TTY, runs headless)
echo "task" | ./target/release/intendant
Testing
cargo test --bins # Unit tests (fast, no API keys needed)
cargo test -- --list # List all test names
The test suite covers both binaries with inline #[cfg(test)] modules. See Session Logging for the full test coverage summary.
Integration tests in tests/e2e/ spawn a real binary and make real API calls — see Architecture for details.
Architecture
Overview
Intendant is a two-binary system: a command runtime that executes work, and a caller that drives it via AI model APIs.
stdin (JSON) --> intendant-runtime --> executes commands sequentially (blocking)
|
+--> in-memory process state (HashMap<nonce, ProcessInfo>)
+--> $INTENDANT_LOG_DIR/ (stdout/stderr logs per nonce)
|
+--> stdout (result lines with exit code, stdout/stderr tail)
intendant (3 modes) --> detects project root (git) --> loads memory/knowledge/skills
|
+--> User Mode: spawns orchestrator subprocess, monitors progress (no API calls)
+--> Sub-Agent Mode: scoped task, writes results/progress, isolated context
+--> Direct Mode: single-loop execution for simple tasks
|
+--> Presence layer: conversational mediator between user and agent loop
+--> Native tool calling (OpenAI/Anthropic/Gemini) with text extraction fallback
+--> Streaming output: SSE-based token streaming for all 3 providers
+--> Ratatui TUI: status bar, scrollable log, approval panel, askHuman input
+--> Web dashboard: 4-tab app (Activity/Usage/Terminal/Displays) with WASM-driven state
+--> Live voice: Gemini Live / OpenAI Realtime via browser, active/passive multi-browser
+--> MCP Server: --mcp flag, stdio transport, full parity with TUI (tools + resources)
+--> MCP Client: connects to external MCP servers (configured in intendant.toml)
+--> Autonomy system: Low/Medium/High/Full + per-category rules from intendant.toml
+--> Skills system: SKILL.md-based instruction sets with YAML frontmatter
+--> Transcription: server-side Whisper API for browser audio transcription
+--> Landlock sandbox: filesystem restrictions on agent runtime (Linux)
+--> Prompt caching: Anthropic cache_control, OpenAI/Gemini implicit caching
+--> Auto-compaction: triggers at 90% context usage, preserves system+tail messages
+--> Control socket: /tmp/intendant-<pid>.sock (JSON-line protocol)
+--> VNC proxy: WebSocket-to-TCP bridge for noVNC display viewing
+--> Token budget tracking (context-window-aware loop termination)
+--> Session resume: --continue (most recent) or --resume <id> (specific session)
+--> Git worktree isolation for implementation agents
+--> Tagged knowledge store with pub/sub channels between agents
Process State
In-memory HashMap<u64, ProcessInfo> tracking nonce, PID, status, exit code, and timestamp. Ephemeral — does not survive binary restarts. Each runtime invocation starts with an empty process map.
Session Directory
Per-session directory at ~/.intendant/logs/<uuid>/ with UUID-based naming. Contains per-nonce stdout/stderr log files, structured session logs (session.jsonl), conversation history (conversation.jsonl), and askHuman IPC files. The log directory is passed to the runtime via INTENDANT_LOG_DIR.
Execution Model
Commands are processed sequentially. Each command blocks until completion and returns its result directly (exit code, stdout tail, stderr tail). The runtime exits after processing all commands. Daemons backgrounded in bash continue after the tool returns.
Execution Modes
intendant operates in one of three modes, selected automatically based on task complexity and environment:
Direct Mode
Activated for simple tasks, or forced with --direct:
- Single-loop execution with the selected model
- Budget-aware loop: stops at context exhaustion,
donesignal, or 500-turn safety cap - Used for short tasks that don’t need multi-agent orchestration
User Mode
Activated for complex tasks without INTENDANT_ROLE:
- Pure subprocess monitor — makes zero model API calls at Layer 0
- Spawns an orchestrator sub-agent as a child process via
tokio::process::Command - Polls the orchestrator’s progress file every 500ms, relays status to the TUI or stdout
- Reads the orchestrator’s result file on exit; synthesizes a failure if the process crashes
kill_on_drop(true)ensures the orchestrator is terminated if the user quits the TUI
Sub-Agent Mode
Activated when INTENDANT_ROLE env var is set:
- Runs as a child agent with a scoped task
- Writes periodic progress to
INTENDANT_PROGRESS_FILE - Writes final results (summary, findings, artifacts, token usage) to
INTENDANT_RESULT_FILE - Uses role-specific system prompts (
SysPrompt_research.md,SysPrompt_implementation.md, etc.)
See Multi-Agent Orchestration for the full sub-agent architecture.
How It Works (Direct Mode)
- Loads
.envand selects the API provider (OpenAI, Anthropic, or Gemini). OpenAI uses the Responses API (/v1/responses), Anthropic uses the Messages API, Gemini uses thegenerateContentendpoint. All providers support streaming via SSE - Configures structured output (JSON mode), reasoning controls, native tool calling, prompt caching (Anthropic
cache_control), and max output tokens based on model capabilities and env vars - Detects the project root (via
git rev-parse --show-toplevel, falls back to cwd) - Resolves role-appropriate system prompt via cascade: project root →
~/.config/intendant/→ compiled-in default. When native tools are enabled, uses the condensedSysPrompt_tools.md(tool docs live in API tool definitions instead of prose) - Injects the project working directory into the conversation so the model knows which project to work in
- Loads knowledge from
<project>/.intendant/memory.json, injects into conversation - Loads
INTENDANT.mdproject instructions (global then project-local), injects into conversation - Logs the full messages array to
turn_NNN_messages.jsonbefore each API call - Sends the task to the chat API via streaming (
chat_stream()), withmax_tokens/max_output_tokens, optionalreasoning, optional JSON format, and native tool definitions when enabled. API requests use exponential backoff retry (up to 5 retries) for rate-limit (429) and server errors (5xx). Text deltas are forwarded to the TUI in real-time - Logs reasoning content (both summary and full text) to
turn_NNN_reasoning.txtwhen available - Processes the model’s response via one of two paths:
- Native tool call path (when response contains tool calls): Collects individual tool calls, assembles them into an
AgentInputbatch, pipes to the runtime, maps results back to per-tool-call responses. Handlesmanage_contextandsignal_donetool calls caller-side. Raw API output items (reasoning + function_call) are preserved for verbatim echo-back in subsequent requests, which reasoning models (GPT-5, o3, o4) require - Legacy text extraction path (fallback): Extracts JSON from the response text (handles structured output, code fences, and bare JSON), checks for explicit
donesignal ({"done": true})
- Native tool call path (when response contains tool calls): Collects individual tool calls, assembles them into an
- Applies context directives (
drop_turns,summarize) to the conversation - Injects project context (
memory_file) into relevant commands - Classifies commands by action category (file read/write/delete, exec, network, destructive) and checks autonomy rules
- If approval is required:
- TUI mode: emits an approval request and waits for user response
- Headless mode: denies execution (no implicit auto-approve fallback)
- Pipes the JSON to the
intendant-runtimebinary and waits for completion with a hard timeout (120s default, 600s foraskHuman) - Feeds the agent output back as the next user message (text path) or as individual tool results (tool call path), appending a token budget summary
- Repeats until the model signals done, responds with no JSON, or the context budget is exhausted
- In headless mode, if the model emits
askHuman, the loop sends a recovery prompt back to the model (continue with explicit assumptions) instead of blocking on human-input timeout
askHuman Behavior
- In TUI mode,
askHumanopens the input panel and writes your answer to the session-scoped response file. - Empty submit is rejected in the TUI; you must provide non-empty input or press
Escto cancel. - In headless mode (
--no-tuior non-interactive stdin),askHumancannot be answered interactively. The loop tells the model to continue with explicit assumptions instead of waiting for the runtime timeout. - Runtime-level timeout for unanswered
askHumanremains 5 minutes.
Streaming
All three providers support streaming via chat_stream() on the ChatProvider trait:
- Anthropic:
stream: trueon Messages API, parsescontent_block_delta,content_block_start/stop,message_delta - OpenAI:
stream: trueon Responses API, parsesresponse.output_text.delta,response.function_call_arguments.delta,response.completed - Gemini:
streamGenerateContent?alt=sseendpoint, parses chunked JSON candidates
Text deltas are forwarded to the TUI via AppEvent::ModelResponseDelta and accumulated in App::streaming_buffer, which is cleared when the full ModelResponse arrives.
Rate-Limit Retry
API requests use send_with_retry() with exponential backoff (1s × 2^attempt + jitter, up to 5 retries) for HTTP 429 and 5xx responses. Non-retryable errors (400, 401, etc.) fail immediately. API keys in error messages are masked via mask_api_keys().
Prompt Caching
- Anthropic: Uses
anthropic-beta: prompt-caching-2024-07-31header with structured system content containingcache_control: {"type": "ephemeral"} - OpenAI: Automatic server-side caching for prompts >1024 tokens (no API changes needed)
- Gemini: Implicit context caching (no API changes needed)
Auto-Compaction
When context usage reaches 90% (usage_fraction() >= 0.90), conversation.auto_compact() triggers:
- Keeps: system message, first 2 context messages, last 4 messages
- Summarizes: oldest half of remaining middle messages via
summarize_turns() - Emits
ContextManagementevent to TUI/MCP
Vision / Display Management
Xvfb is auto-launched lazily on the first turn containing an execAsAgent or captureScreen command when no accessible X display exists (checked via xdpyinfo). The detection flow:
- Already launched? → skip
- Batch contains
execAsAgentorcaptureScreen? No → skip - Current
DISPLAYaccessible? Yes → skip (user has working display) - Auto-launch Xvfb, store guard, set
DISPLAY, emitDisplayReadyevent - On failure → log warning, let
captureScreenfail naturally
Display allocation prefers :99 for a predictable VNC port (5999). If :99 is locked by a live Xvfb from a previous session, it is automatically killed and reclaimed (detected via /proc/<pid>/cmdline). If :99 is held by a non-Xvfb process, allocation falls through to :100+.
Per-provider display resolutions:
- OpenAI: 1024×768 (3 tiles of 512×512)
- Anthropic: 819×1456 (9:16 aspect)
- Gemini: 768×1024 (2 tiles of 768×768)
VNC Remote Observation
An x11vnc server is launched alongside Xvfb as a best-effort co-process (port = 5900 + display_id). If x11vnc is not installed, the display works normally. Both Xvfb and x11vnc are killed on drop via XvfbGuard.
On the guest VM — install x11vnc:
sudo apt-get install -y x11vnc
From your host machine — connect with any VNC client:
# Direct connection (VM on local network)
vncviewer <vm-ip>:5999
# Over SSH tunnel (recommended for remote VMs)
ssh -L 5999:localhost:5999 user@vm-host
vncviewer localhost:5999
If display :99 was already taken, intendant falls through to :100+ and the VNC port shifts accordingly (6000, 6001, …). Check the TUI log panel or stderr for the actual port:
22:29:12 VNC server available at vnc://localhost:5999
Environment
- OS: Linux (Debian 12+)
- Runtime: Tokio async (full features)
- Permissions: Runs as unprivileged user with passwordless sudo
- Display: Auto-managed Xvfb with x11vnc (see above)
- X11 auth: At startup the runtime discovers active X displays and merges their xauth cookies into a session-scoped
session.Xauthorityfile, passed asXAUTHORITYto all spawned commands
Configuration
CLI Flags
| Flag | Description |
|---|---|
--provider <name> | Force provider (openai, anthropic, or gemini) |
--model <name> | Override model name |
--verbose / -v | Show debug-level log entries in TUI |
--no-tui | Disable TUI, use plain text output |
--autonomy <level> | Set autonomy level (low, medium, high, full) |
--log-file <dir> | Override session log directory |
--mcp | Run as MCP server on stdio (replaces TUI) |
--control-socket | Enable Unix control socket at /tmp/intendant-<pid>.sock |
--json | JSONL structured output to stdout (implies --no-tui) |
--sandbox | Enable Landlock filesystem sandboxing (Linux kernel 5.13+) |
--direct | Force single-agent direct mode (skip orchestrator even for complex tasks) |
--no-presence | Disable the presence layer (direct agent interaction) |
--continue / -c | Resume most recent session for this project |
--resume <id> / -r <id> | Resume specific session by ID or prefix |
--web [PORT] | Start web dashboard with Activity/Usage/Terminal/Displays tabs + optional voice (default port 8765) |
--transcription | Enable server-side audio transcription (Whisper API) |
The TUI launches only when both stdin and stdout are terminals. When piping input/output or in sub-agent mode, intendant falls back to headless mode.
Environment Variables
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY / OPENAI | — | OpenAI API key |
ANTHROPIC_API_KEY / ANTHROPIC | — | Anthropic API key |
GEMINI_API_KEY | — | Google AI (Gemini) API key |
PROVIDER | auto-detect | "openai", "anthropic", or "gemini" (used when multiple keys are set) |
MODEL_NAME | per-provider default | Model to use (e.g. gpt-5.2-codex, claude-sonnet-4-5-20250929, gemini-2.5-pro) |
USE_NATIVE_TOOLS | true | Enable native API tool calling; false falls back to text-based JSON extraction |
MODEL_CONTEXT_WINDOW | per-model default | Context window size in tokens |
MAX_OUTPUT_TOKENS | per-model default | Max output tokens per API call (sent to API) |
STRUCTURED_OUTPUT | true for gpt-5+/o3/o4 | Enable JSON object mode for deterministic parsing |
REASONING_EFFORT | — | Reasoning effort for GPT-5/o3/o4 models (low, medium, high) |
REASONING_SUMMARY | — | Reasoning summary mode (auto, concise, detailed) |
PRESENCE_PROVIDER | — | Override provider for the presence layer (fallback: PROVIDER) |
PRESENCE_MODEL | — | Override model for the presence layer |
INTENDANT_LOG_DIR | auto | Session log directory (set automatically by caller for the runtime) |
Sub-Agent Environment Variables
These are set automatically when spawning sub-agents (see Multi-Agent Orchestration):
| Variable | Description |
|---|---|
INTENDANT_ROLE | Sub-agent role (orchestrator, research, implementation, testing) |
INTENDANT_ID | Unique sub-agent identifier |
INTENDANT_TASK | Task description for the sub-agent |
INTENDANT_RESULT_FILE | Path for sub-agent to write final results |
INTENDANT_PROGRESS_FILE | Path for sub-agent to write periodic progress |
INTENDANT_PARENT_KNOWLEDGE | Path to parent’s knowledge store for inheritance |
INTENDANT_INHERIT_MEMORY | 1 to inherit project memory |
INTENDANT_SANDBOX_WRITE_PATHS | Landlock write paths (set by caller when sandboxing) |
INTENDANT_MCP_RELOAD | 1 when process was exec’d for MCP hot-reload |
The agent runner hard timeout is 120s default, automatically extended to 600s when askHuman is present in the command batch.
Project Configuration
Create intendant.toml in the project root:
[memory]
enabled = true # default: true
[model]
context_window = 200000 # override per-model default
max_output_tokens = 8192 # override per-model default
[orchestrator]
max_parallel_agents = 4 # max concurrent sub-agents
sub_agent_dir = ".intendant/subagents" # where sub-agent workspaces are created
[approval]
file_read = "auto" # auto-approve file reads
file_write = "ask" # ask before file writes (default)
file_delete = "ask" # ask before file deletes (default)
command_exec = "auto" # auto-approve command execution
network = "auto" # auto-approve network requests
destructive = "ask" # ask before destructive commands (default)
[presence]
enabled = true # enable the conversational presence layer (default: true)
provider = "gemini" # provider for the presence model (optional, falls back to PROVIDER)
model = "gemini-2.5-flash" # model for the presence layer (optional)
live_provider = "gemini" # provider for browser-side live presence (optional)
live_model = "gemini-2.5-flash-native-audio-preview-12-2025" # model for browser-side live presence (optional)
context_window = 32768 # context window for the presence conversation (default: 32768)
[transcription]
enabled = false # enable server-side audio transcription (default: false)
provider = "openai" # transcription provider (default: "openai")
model = "whisper-1" # transcription model (default: "whisper-1")
language = "en" # ISO-639-1 language hint (optional, auto-detect if omitted)
# endpoint = "http://..." # custom endpoint for self-hosted whisper.cpp
[sandbox]
enabled = false # enable Landlock filesystem sandboxing (default: false)
extra_write_paths = ["/var/log"] # additional writable paths beyond project root, /tmp, log dir
# External MCP servers to connect to as a client
[[mcp_servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
[[mcp_servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
[mcp_servers.env]
GITHUB_TOKEN = "ghp_..."
Skills
Skills are named instruction sets stored as SKILL.md files with YAML frontmatter. They are discovered from two directories (project-scoped first):
<project_root>/.intendant/skills/<name>/SKILL.md~/.intendant/skills/<name>/SKILL.md
Example SKILL.md:
---
name: deploy
description: Deploy the application to production
autonomy: high
disable-auto-invocation: true
---
## Steps
1. Run tests
2. Build release binary
3. Deploy to server
Frontmatter fields:
name— skill identifier (required)description— shown in skill catalog (required)autonomy— override session autonomy level when active (optional)disable-auto-invocation— iftrue, only user can trigger this skill (optional, defaultfalse)sandbox— override session sandbox setting (optional)
Project skills take precedence over personal skills with the same name. Available skills are formatted into a catalog and injected into the agent’s conversation.
When sandboxing is enabled (via --sandbox or [sandbox].enabled = true), runtime command execution is restricted to read-only filesystem access plus writes to project root, /tmp, session log directory, ~/.intendant, and extra_write_paths. On kernels without Landlock support, sandboxing is silently skipped.
INTENDANT.md Project Instructions
Place an INTENDANT.md file in your project root or at ~/.config/intendant/INTENDANT.md for global instructions. These are injected into the conversation at session start, before knowledge/memory. Both files are loaded if present (global first, project-local second).
System Prompts
System prompts are compiled into the binary at build time, so intendant works from any directory without needing the source tree. Two base prompt variants exist:
SysPrompt.md— Full prompt with JSON schema and per-function documentation (used with text-based JSON extraction)SysPrompt_tools.md— Condensed prompt for native tool calling mode (function docs live in API tool definitions, reducing system prompt tokens)
The active variant is selected automatically based on whether the provider has native tool calling enabled.
Prompts are resolved using a 3-layer cascade (highest priority first):
- Project root —
<git-root>/SysPrompt.mdorSysPrompt_tools.md(per-project customization) - Global config —
~/.config/intendant/SysPrompt.mdorSysPrompt_tools.md(user-wide customization) - Compiled-in default — always available, zero-config
Role-specific prompts (SysPrompt_orchestrator.md, SysPrompt_research.md, SysPrompt_implementation.md) follow the same cascade and are appended to the base prompt. The presence layer uses its own standalone prompt (SysPrompt_presence.md).
To customize prompts for a specific project, place your modified .md files in the project’s git root. For user-wide customization, place them in ~/.config/intendant/.
Runtime Protocol
The intendant-runtime binary reads a single JSON object from stdin, executes commands sequentially, and writes result lines to stdout.
Basic Usage
echo '{"commands":[{"function":"execAsAgent","nonce":1,"command":"echo hello"}]}' \
| ./target/release/intendant-runtime
Output is a JSON result line containing the nonce, exit code, stdout tail (last 10KB), and stderr tail.
Inspect a file path:
echo '{"commands":[{"function":"inspectPath","nonce":1,"path":"/etc/hosts"}]}' \
| ./target/release/intendant-runtime
Edit a file:
echo '{"commands":[{"function":"editFile","nonce":1,"file_path":"/tmp/test.txt","operation":"write","content":"hello"}]}' \
| ./target/release/intendant-runtime
Fetch a web page as text:
echo '{"commands":[{"function":"browse","nonce":1,"url":"https://example.com"}]}' \
| ./target/release/intendant-runtime
Run stateful commands in a persistent PTY:
echo '{"commands":[{"function":"execPty","nonce":1,"command":"cd /tmp"},{"function":"execPty","nonce":2,"command":"pwd"}]}' \
| ./target/release/intendant-runtime
Store and recall memory (supports tagged knowledge with channels):
# Basic store
echo '{"commands":[{"function":"storeMemory","nonce":1,"memory_key":"db-config","memory_summary":"PostgreSQL on port 5432","memory_file":"/path/to/.intendant/memory.json"}]}' \
| ./target/release/intendant-runtime
# Store with tags and channel
echo '{"commands":[{"function":"storeMemory","nonce":1,"memory_key":"db-config","memory_summary":"PostgreSQL on port 5432","memory_tags":"database,config","memory_channel":"findings","memory_source":"research-1","memory_file":"/path/to/.intendant/memory.json"}]}' \
| ./target/release/intendant-runtime
# Recall with filters
echo '{"commands":[{"function":"recallMemory","nonce":1,"memory_query":"database","memory_tags":"config","memory_channel":"findings","memory_file":"/path/to/.intendant/memory.json"}]}' \
| ./target/release/intendant-runtime
Functions
Runtime Functions
| Function | Description | Key Fields |
|---|---|---|
execAsAgent | Run a bash command (blocks until exit, returns exit code + stdout/stderr tail) | command, display, wait_for_port |
captureScreen | Screenshot a display via ImageMagick | display |
inspectPath | Inspect filesystem path metadata (type, size, perms, timestamps) | path |
editFile | Structured file editing without shell commands | file_path, operation, content, match_content, line_number, end_line |
writeFile | Alias for editFile with operation: "write" (backward compatibility) | file_path, content |
browse | Fetch URL and convert HTML to plain text (50KB max) | url |
askHuman | Ask the operator a question and wait for response (5-minute timeout) | question, timeout_ms |
execPty | Run command in a persistent PTY session (bash --norc --noprofile) | command, shell_id |
storeMemory | Store a knowledge entry with optional tags/channel | memory_key, memory_summary, memory_file, memory_tags, memory_channel, memory_source |
recallMemory | Search knowledge by keyword with optional filters | memory_query, memory_file, memory_tags, memory_channel, memory_source, memory_since |
Caller-Handled Functions
These are intercepted by the caller and never reach the runtime:
| Function | Description |
|---|---|
manage_context | Apply context directives (drop/summarize turns) to the conversation |
signal_done | Signal task completion to the caller loop |
Native Tool Names
When using native tool calling (the default), tool names use snake_case:
| Native Name | Runtime Function |
|---|---|
exec_command | execAsAgent |
capture_screen | captureScreen |
inspect_path | inspectPath |
edit_file | editFile |
browse_url | browse |
ask_human | askHuman |
exec_pty | execPty |
store_memory | storeMemory |
recall_memory | recallMemory |
manage_context | (caller-handled) |
signal_done | (caller-handled) |
editFile Operations
The editFile function supports 5 operations:
| Operation | Description | Required Fields |
|---|---|---|
write | Write content to file (creates or overwrites) | file_path, content |
append | Append content to end of file | file_path, content |
replace | Replace matching text with new content | file_path, match_content, content |
insert_at | Insert content at a specific line number | file_path, line_number, content |
replace_lines | Replace a range of lines | file_path, line_number, end_line, content |
Nonce Variables
Use $NONCE[id] in command strings to reference the PID of a previously launched nonce. For example, kill -9 $NONCE[10] kills the process started by nonce 10. Handled by regex-based substitution in replace_nonce_refs().
Context Management
The model can include a context field alongside commands to manage conversation history:
{
"commands": [...],
"context": {
"drop_turns": [3, 4, 5],
"summarize": { "turns": [7, 8, 9, 10], "summary": "Set up nginx with reverse proxy" }
}
}
drop_turns: Remove messages at given indices (system prompt and last 2 messages are protected).summarize: Replace a range of messages with a single summary.- Context-only turns (empty commands) are supported for pruning without executing anything.
Knowledge System
Project knowledge persists tagged entries across sessions in <project>/.intendant/memory.json. The system supports both the legacy key-value format and the new tagged knowledge format with automatic migration.
storeMemory: Creates or updates an entry with key, summary, tags, channel, and source. Backward-compatible with old format.recallMemory: Searches entries by keyword with optional filters (tags, channel, source, since timestamp). Results are ranked by relevance (key/summary match).- Knowledge is loaded and injected into the conversation at session start.
- Supports pub/sub channels for inter-agent knowledge sharing:
- Agents publish findings to named channels (e.g.,
"findings","decisions") - The orchestrator routes knowledge between sibling agents via subscriptions
- Cursor-based tracking ensures agents only see new entries
- Agents publish findings to named channels (e.g.,
- Can be disabled in
intendant.toml:
[memory]
enabled = false # default: true
JSON Output Mode
--json enables JSONL structured output to stdout (implies --no-tui). Each line is a JSON object with type and data fields. Event types include: turn_started, model_response, model_response_delta, agent_output, done, error, approval_required, human_question, budget_warning, round_complete, context_management.
In JSON mode, stdin accepts both plain text (follow-up messages) and JSON commands using the same ControlMsg format as the Unix control socket:
{"action":"approve","id":123}
{"action":"deny","id":123}
{"action":"skip","id":123}
{"action":"approve_all","id":123}
{"action":"input","text":"answer to askHuman"}
{"action":"follow_up","text":"continue with this"}
Lines not starting with { or not parseable as ControlMsg are treated as follow-up text. This makes --json mode fully interactive: approval flows, askHuman, and multi-round conversations all work without a TUI or control socket.
TUI & Autonomy
TUI
intendant includes a ratatui-based terminal UI that launches automatically when both stdin and stdout are terminals. The TUI provides real-time monitoring and control of the agent loop.
Layout
┌─────────────────────────────────────────────┐
│ StatusBar: provider │ model │ turn │ budget │ 1 line
├─────────────────────────────────────────────┤
│ ActionPanel: phase + spinner + key hints │ 2 lines
├─────────────────────────────────────────────┤
│ │
│ LogPanel: scrollable, color-coded entries │ fills remaining
│ │
├─────────────────────────────────────────────┤
│ ApprovalPanel / InputPanel (conditional) │ 3-4 lines
└─────────────────────────────────────────────┘
Panels
- Status bar: Provider, model, turn count, budget percentage, autonomy level
- Action panel: Current phase with spinner — Thinking, RunningAgent, Orchestrating, WaitingApproval, WaitingHuman, WaitingFollowUp, Idle, Done
- Log panel: Scrollable chronological log with color-coded levels (Info, Warning, Error, Debug)
- Approval panel: Shown when an action needs user approval — command preview + category, y/s/a/n keys
- Input panel: Shown when
askHumanis triggered —tui-textareafor response - Follow-up panel: Shown when agent completes a round and awaits follow-up input
- Help overlay: Key bindings reference (
?key) - Inspect overlay: Detailed view of selected log entry
Key Bindings
| Key | Action |
|---|---|
q / Ctrl-C | Quit |
v | Toggle verbose mode (cycle through quiet/normal/verbose/debug) |
? | Help overlay |
+ / - | Cycle autonomy level |
Up/Down/PgUp/PgDn | Scroll log |
Home / End | Jump to top/bottom of log |
1-3 | Toggle panels (status, action, log) |
y / Enter | Approve pending action |
s | Skip pending action |
a | Auto-approve all remaining |
n | Deny and stop |
Markdown Rendering
Model responses containing markdown are rendered with syntax highlighting in the log panel:
- Headers (
#through####) in blue - Bold (
**text**) with bright styling - Italic (
*text*) in lavender - Inline code (
`code`) in green - Fenced code blocks (
```) in green - List items (
-and*) with yellow bullets - Horizontal rules (
---) as dim lines
Streaming Display
When a model is generating a response, text deltas are forwarded to the TUI in real-time via AppEvent::ModelResponseDelta and accumulated in a streaming buffer. The buffer is cleared when the full response arrives. This gives immediate feedback during long model responses.
Theme
The TUI uses a Catppuccin Mocha-inspired color scheme with budget-aware color thresholds (green → yellow → red as context fills up).
Autonomy System
The autonomy system controls which actions require human approval. It operates on three layers:
Layer 1 — Global Level
Set via CLI --autonomy flag, toggleable in TUI with +/-:
| Level | Behavior |
|---|---|
| Low | Ask before every command execution |
| Medium | Ask before writes, network, destructive (default) |
| High | Only ask for unavoidable human input |
| Full | Never ask (fully autonomous) |
Layer 2 — Per-Category Rules
From intendant.toml [approval] section. Overrides the global level for specific action categories. Rules: auto (always approve), ask (require approval), deny (always deny).
Layer 3 — Per-Action Approval
When approval is needed, the agent loop pauses and the TUI shows the command preview. The user can approve, skip, deny, or switch to auto-approve mode.
Action Classification
Commands are classified into categories by inspecting the command JSON:
| Category | Examples |
|---|---|
| FileRead | inspectPath, recallMemory |
| FileWrite | editFile, writeFile, storeMemory |
| FileDelete | Commands with rm, rmdir |
| CommandExec | execAsAgent, execPty |
| NetworkRequest | Commands with curl, wget, ssh, git |
| Destructive | Commands with rm -rf, kill, dd, mkfs, sudo |
| HumanInput | askHuman |
Shell commands are further classified by inspecting the command string for destructive patterns, network tools, and file writes (redirects, tee, mv, cp). The sudo prefix is detected as Destructive and the actual command after sudo is also classified.
Web Dashboard
The --web flag starts a web server that serves a modern 4-tab dashboard at / with Activity, Usage, Terminal, and Displays tabs. The Terminal tab provides the same ratatui interface as the native TUI via xterm.js, while the other tabs add event logging, cost tracking, and remote display viewing.
# Default port 8765
./target/release/intendant --web
# Custom port
./target/release/intendant --web 9000
The --web flag implies --mcp, so no initial task is required — the agent starts idle and accepts tasks dynamically. Open http://<host>:8765/ in a browser.
The dashboard also supports optional live voice interaction via Gemini Live or OpenAI Realtime, with active/passive multi-browser support and session continuity across reconnects.
See Web Dashboard for full documentation and Integrations — Web Gateway for the WebSocket protocol.
Web Dashboard
The --web flag starts a web server that serves a modern dashboard for monitoring and interacting with Intendant remotely. The dashboard runs entirely in the browser with WASM-powered state management.
Running
# Default port 8765
./target/release/intendant --web
# Custom port
./target/release/intendant --web 9000
Open http://<host>:8765/ in a browser. The --web flag implies --mcp, so no initial task is required — the agent starts idle and accepts tasks dynamically.
Dashboard Tabs
Activity
A scrollable, color-coded event log showing everything happening in the system:
- system — session lifecycle, approvals, context management
- worker — model responses, reasoning summaries, task completion
- agent — command execution output (stdout/stderr, exit codes)
- live — voice transcripts, presence lifecycle, tool requests
- server — presence model internals (thinking, tool calls)
Events are grouped by turn with visual separators. New events while viewing other tabs trigger a notification badge. Late-connecting browsers receive a full replay of historical events from session.jsonl.
Usage
Token consumption for the main model and presence model:
- Prompt, completion, and cached token breakdowns
- Cost estimates using a built-in pricing table (OpenAI, Anthropic, Gemini models)
- Usage history over time
- Updated after each agent turn via
usage_updateevents
Terminal
An embedded xterm.js terminal connected to the server-side ratatui TUI. Each browser connection gets its own independent terminal rendering with separate dimensions. This shows the same interface as the native terminal TUI — status bar, log panel, action panel, approval/input panels.
Key presses and terminal resizes in the browser are sent to the server and rendered independently per connection.
Displays
Remote viewing of Xvfb displays created by the agent. When the agent runs graphical applications (via execAsAgent with a DISPLAY), the display appears here as a noVNC viewer.
Displays are created lazily — the tab populates automatically when the agent’s first command triggers Xvfb auto-launch. Each display shows the VNC port for direct connection too.
Live Voice
The dashboard supports optional live voice interaction via Gemini Live or OpenAI Realtime. When activated:
- The browser connects directly to the model’s realtime API for low-latency voice I/O
- The live model receives agent events and narrates progress
- Tool calls from the live model (
submit_task,approve_action,check_status, etc.) are routed through the WebSocket to the server - Server-side presence is automatically paused (mutual exclusion)
Setup
- Enter your API key on first visit (Gemini or OpenAI)
- Keys are stored in browser localStorage — never sent to the Intendant server
- Click the microphone button to connect
Active/Passive Browsers
Only one browser can be “active” (controlling the voice model) at a time:
- First browser to connect voice becomes active
- Additional browsers are passive observers (receive events and TUI frames, but don’t pause server-side presence)
- A passive browser can request active status via the UI, which force-disconnects the previous active browser
- Active handover includes the last checkpoint summary and conversation context
Session Continuity
The presence session protocol maintains context across reconnects:
- On connect, the server sends a
presence_welcomewith current state, missed events, and conversation context - The browser sends periodic
presence_checkpointmessages with a summary of the conversation - On reconnect, the server replays events since the last checkpoint
- This prevents the voice model from losing context when the browser refreshes or the connection drops
Server-Side Transcription
When [transcription] is enabled in intendant.toml, the browser sends microphone audio to the server for transcription via the Whisper API:
[transcription]
enabled = true
provider = "openai"
model = "whisper-1"
language = "en"
Audio is buffered in ~3s chunks, filtered by RMS energy to skip silence, and sent to the transcription endpoint. Transcripts are broadcast as user_transcript events and logged to the session.
Configuration
The web gateway configuration is controlled by [presence] settings in intendant.toml:
[presence]
live_provider = "gemini" # voice model provider
live_model = "gemini-2.5-flash-native-audio-preview-12-2025" # voice model
Or via environment variables:
GEMINI_API_KEY/OPENAI_API_KEY— for ephemeral token minting (POST /session)
The /config endpoint returns the configured provider, model, and sample rates as JSON.
HTTP Endpoints
| Endpoint | Description |
|---|---|
GET / | Web app dashboard (4-tab UI) |
GET /config | Live model configuration JSON |
GET /debug | Debug JSON (agent state, voice connection, active browser) |
POST /session | Mint ephemeral session tokens for Gemini Live / OpenAI Realtime |
GET /wasm-web/* | WASM and JS glue (content-hash cache-busted) |
GET /audio-processor.js | AudioWorklet processor for microphone capture |
WS / | Main WebSocket (events, terminal I/O, presence protocol) |
WS /vnc | WebSocket-to-TCP VNC proxy (for noVNC) |
Requirements
- Microphone access requires a secure context: Use
localhost(via SSH tunnel:ssh -L 8765:localhost:8765 host), or set browser flags for insecure origins - API key for voice: Gemini or OpenAI (stored browser-side only). Voice is optional — the dashboard works without it
- WASM: The dashboard uses a compiled WASM module (
presence-webcrate). Rebuild withwasm-pack build --target webfromcrates/presence-web/if you modify the Rust code
Multi-Agent Orchestration
Intendant supports multi-agent orchestration where a parent orchestrator decomposes complex tasks into sub-tasks and delegates them to specialized child agents. Each agent runs as a separate intendant process with its own context window, system prompt, and session log.
How It Works
User (TUI / MCP / Web)
│
▼
[User Mode] — pure subprocess monitor, zero API calls
│
▼
[Orchestrator Sub-Agent] — decomposes task, coordinates
├──▶ [Research Agent] — investigation, file reading, browsing
├──▶ [Implementation Agent] — code writing, builds, tests (git worktree)
└──▶ [Testing Agent] — validation, test execution
│
▼
Results merged, knowledge consolidated
When a complex task is submitted (and --direct is not set), intendant enters User Mode: it spawns an orchestrator sub-agent and monitors its progress without making any model API calls itself. The orchestrator then spawns specialized sub-agents as needed.
Agent Roles
Each sub-agent role has a dedicated system prompt that is appended to the base prompt:
| Role | Prompt | Focus |
|---|---|---|
orchestrator | SysPrompt_orchestrator.md | Task decomposition, sub-agent management, coordination, checkpointing |
research | SysPrompt_research.md | Investigation, file reading, browsing, synthesizing findings |
implementation | SysPrompt_implementation.md | Code writing, builds, testing, git worktree isolation |
testing | SysPrompt_testing.md | Validation, test execution, coverage |
Sub-Agent Spawning
Sub-agents are spawned via tokio::process::Command with environment variables that configure their behavior:
| Variable | Purpose |
|---|---|
INTENDANT_ROLE | Agent role (triggers sub-agent mode) |
INTENDANT_ID | Unique identifier for this agent |
INTENDANT_TASK | Task description |
INTENDANT_RESULT_FILE | Path to write final results |
INTENDANT_PROGRESS_FILE | Path to write periodic progress |
INTENDANT_PARENT_KNOWLEDGE | Path to parent’s knowledge store |
INTENDANT_INHERIT_MEMORY | 1 to inherit project memory |
Progress and Results
Progress Polling
The parent agent polls each sub-agent’s progress file every 500ms. Progress is a JSON file with:
{
"turn": 5,
"status": "running",
"last_action": "Running cargo test",
"question": null
}
Progress updates are relayed to the TUI or stdout as OrchestratorProgress events.
Result Files
When a sub-agent completes, it writes a result JSON file:
{
"id": "research-1",
"status": "Completed",
"summary": "Found 3 relevant API endpoints...",
"findings": ["endpoint /api/users supports pagination", "..."],
"artifacts": ["docs/api-analysis.md"],
"usage": { "tokens_used": 15000, "context_window": 128000 }
}
The orchestrator reads result files to synthesize final outcomes and route knowledge between agents.
Git Worktree Isolation
Implementation agents can work in isolated git worktrees to avoid conflicts with the main working tree:
- Create:
worktree.rscreates a new worktree branch for the agent - Merge: On successful completion, the orchestrator merges the worktree branch back
- Conflict handling: If merge conflicts arise, the orchestrator is prompted to resolve them
- Cleanup: Worktrees are removed after merge or on failure
This allows multiple implementation agents to work on different parts of the codebase simultaneously without stepping on each other.
Knowledge Routing
The knowledge system supports inter-agent communication via pub/sub channels:
- Publishing: Agents store findings with tagged channels (e.g.,
"findings","decisions","project_state") - Subscribing: The orchestrator sets up subscriptions between agents so they receive relevant knowledge
- Cursor tracking: Each subscription tracks which entries have been consumed, ensuring agents only see new knowledge
- Inheritance: Sub-agents can inherit the parent’s knowledge store via
INTENDANT_INHERIT_MEMORY
Example Flow
- Research agent discovers database configuration → publishes to
"findings"channel with tag"database" - Orchestrator routes
"findings"to implementation agent - Implementation agent receives the database config via
recallMemorywith channel filter - Implementation agent writes code using discovered config
Orchestrator Checkpointing
The orchestrator writes project state checkpoints after each sub-agent completes, using storeMemory with a project_state channel. Checkpoints capture:
- Completed and active tasks
- Architectural decisions made so far
- Constraints and dependencies discovered
This preserves essential context across auto-compaction boundaries — when context is compacted at ~90% usage, the orchestrator can recover state via recallMemory.
Checkpoints are also written to disk as both project_state.json (machine-readable) and project_state.md (human-readable) in the sub-agent directory.
Configuration
Orchestration behavior can be tuned in intendant.toml:
[orchestrator]
max_parallel_agents = 4 # max concurrent sub-agents
sub_agent_dir = ".intendant/subagents" # workspace directory for sub-agents
To force single-agent mode and skip orchestration entirely, use the --direct flag.
Presence Layer
The presence layer is the conversational interface between the user and the agent system. It mediates all interaction: the user talks to presence, presence delegates work via submit_task, and narrates progress as events stream back from the agent loop.
Architecture
Only one presence model is active at a time — either server-side text presence OR browser-side live presence (Gemini Live / OpenAI Realtime). Never both simultaneously.
User input ──▶ [Presence Layer] ──▶ submit_task ──▶ Agent Loop
│ │
│◀── events (phase, approval, etc) ◀┘
│
▼
Narration to user (TUI / Web)
Server-Side Text Presence
The default mode. PresenceLayer wraps a small/fast text model (e.g., gemini-2.5-flash) and maintains its own Conversation separate from the agent’s.
Behavior
- Processes user input via
process_user_input()— decides whether to handle directly or delegate to the agent loop - Narrates agent events via
handle_event()— translates phase changes, approvals, completions into conversational updates - Handles status queries, memory recall, and autonomy changes directly without involving the agent loop
- Uses its own system prompt (
SysPrompt_presence.md) — standalone, not appended to the base agent prompt - Follow-up input in the TUI is routed through the presence layer when active
Configuration
[presence]
enabled = true # default: true
provider = "gemini" # provider for the presence model (optional)
model = "gemini-2.5-flash" # model for the presence layer (optional)
context_window = 32768 # context window for presence conversation (default: 32768)
Or via environment variables:
PRESENCE_PROVIDER— override provider (fallback:PROVIDER)PRESENCE_MODEL— override model
Disable with --no-presence flag or [presence] enabled = false in intendant.toml.
Browser-Side Live Presence
When --web is used and a browser connects a live model (Gemini Live / OpenAI Realtime), the browser sends a presence_connect message over WebSocket. The server pauses PresenceLayer and sends a presence_welcome message with the current state, missed events, and conversation context. The browser’s live model takes over as the conversational front-end, using the same 9 tools via the WebSocket tool request/response protocol.
When the browser’s live model disconnects (page close, error), a presence_disconnect message is sent and server-side presence resumes automatically.
Configuration
[presence]
live_provider = "gemini" # provider for browser-side live presence
live_model = "gemini-2.5-flash-native-audio-preview-12-2025" # model for browser-side live presence
Voice requires an API key (Gemini or OpenAI), stored in browser localStorage. The key is used browser-side only — it is never sent to the Intendant server.
Active/Passive Multi-Browser
Only one browser connection can be “active” (controlling the voice model) at a time. Other connections are passive observers:
- Active browser: Pauses server-side presence, receives tool responses, controls the voice session
- Passive browsers: Receive TUI frames and events but don’t affect server-side presence
- Handover: A passive browser can request active status via
{"t":"make_active"}, which force-disconnects the previous active browser and sends anactive_grantedmessage with handover context
Session Continuity
The presence session protocol maintains voice context across reconnects:
- The server maintains a
PresenceSessionwith an event window and checkpoint state - Browsers send periodic
presence_checkpointmessages with a conversation summary andlast_event_seq - On reconnect, the
presence_welcomeincludes events sincelast_event_seqand the last checkpoint summary - Conversation context from recent voice transcripts is also included for smooth resumption
Presence Tools
The presence layer has 9 tools, defined in the presence-core workspace crate:
Action Tools
| Tool | Description |
|---|---|
submit_task | Submit a new task to the agent loop |
approve_action | Approve a pending action |
deny_action | Deny a pending action |
skip_action | Skip a pending action |
respond_to_question | Answer an askHuman question |
set_autonomy | Change autonomy level |
Action tools dispatch via the EventBus as ControlMsg — the same path as TUI key presses and control socket commands.
Query Tools
| Tool | Description |
|---|---|
check_status | Read current AgentStateSnapshot (phase, turn, budget, pending approval/question) |
query_detail | Get git diff, file contents, or log details from the project |
recall_memory | Search the knowledge store by keywords, with optional channel/tag filters; falls back to session log |
Query tools are handled synchronously server-side. They are shared between PresenceLayer and the web gateway via standalone functions in presence.rs.
Event Filtering
Not all agent events are worth narrating. The presence layer classifies events as:
Push-worthy (trigger narration):
TaskSubmitted,TaskCompleteApprovalRequired,HumanQuestionPhaseChanged(debounced to avoid rapid phase flip noise)ContextManagement
Pull-only (available on request via check_status):
- Status snapshots, log entries, token usage updates
Mutual Exclusion
The presence layer enforces mutual exclusion between server-side and browser-side presence:
- Browser connects live model → sends
{"t":"presence_connect"} - Web gateway emits
AppEvent::PresenceConnected→ pauses server-side presence - Server sends
{"t":"presence_welcome"}with state, event replay, and conversation context - Server-side
PresenceLayer::handle_event()returnsOk(None)while paused - Browser live model handles all presence duties (narration, tool calls, user interaction)
- Browser disconnects → sends
{"t":"presence_disconnect"} - Web gateway emits
AppEvent::PresenceDisconnected→ resumes server-side presence
Legacy live_connected/live_disconnected messages are still accepted for backward compatibility.
presence-core Crate
The crates/presence-core/ workspace crate contains the WASM-compatible core logic:
- Types:
PresenceConfig,TaskEnvelope,PresenceEvent,AgentStateSnapshot,PresenceSession,PresenceCheckpoint,PresenceConnect,PresenceWelcome, constants - Dispatch:
PresenceActionenum,dispatch_tool_call()— pure logic dispatch - Tools: 9 presence tool definitions (provider-agnostic
ToolDefinitionformat) - Format:
format_event(),truncate()(unicode-safe) - Prompt:
DEFAULT_PRESENCE_PROMPTviainclude_str! - WASM:
WasmPresenceobject,get_presence_tools(),get_presence_prompt()— browser-side presence logic
Minimal dependencies (serde + serde_json + wasm-bindgen, no tokio/reqwest). Compiles to both native and wasm32-unknown-unknown. The main crate re-exports its types and converts ToolDefinition to the provider-specific format.
presence-web Crate
The crates/presence-web/ crate provides the browser-side WASM layer:
- app_state.rs — Pure-Rust app state for the web dashboard. All event routing, log filtering, usage tracking, and cost calculation. Methods return
Vec<UiCommand>which the thin JS layer applies to the DOM. Includes a per-model pricing table covering OpenAI, Anthropic, and Gemini models. - app_web.rs — Browser-side app dashboard entry point. WASM↔DOM bridge, tab management, WebSocket event dispatch.
- server.rs — WebSocket connection to the Intendant server, message routing.
- gemini.rs — Gemini Live API integration (BidiGenerateContent), dual-mode auth (API key + ephemeral token).
- openai.rs — OpenAI Realtime API integration.
- callbacks.rs — JS callback management for voice/tool events.
Build: wasm-pack build --target web --out-dir ../../static/wasm-web --out-name presence_web from crates/presence-web/.
Tool Dispatch Flow
Tool dispatch uses presence_core::dispatch_tool_call() which returns a PresenceAction enum:
Tool call arrives (from text model or browser live model)
│
▼
dispatch_tool_call() → PresenceAction
│
├── TextResult(text) → return immediately
├── SubmitTask(envelope) → send to EventBus
├── Approve/Deny/Skip → send ControlMsg to EventBus
├── SetAutonomy(level) → send ControlMsg to EventBus
└── NeedsIO(query) → platform layer handles:
├── check_status → read AgentStateSnapshot
├── query_detail → read files, git diff
└── recall_memory → search knowledge store + session log
Pure-logic tools return TextResult/SubmitTask/Approve/etc. I/O-dependent tools return NeedsIO for the platform layer to handle, keeping presence-core free of I/O dependencies.
MCP Server
The --mcp flag launches Intendant as a Model Context Protocol server on stdio. This lets external AI agents (Claude Code, Codex, etc.) observe and control Intendant with full parity to the TUI — every action a human can take in the TUI is available as an MCP tool. The server also supports connecting to external MCP servers as a client (see MCP Client below).
Running
# Launch as MCP server (stdio transport)
./target/release/intendant --mcp "Deploy the application"
# With provider/model overrides
./target/release/intendant --mcp --provider anthropic --model claude-sonnet-4-5-20250929 "Fix the tests"
# With autonomy preset
./target/release/intendant --mcp --autonomy high "Refactor the auth module"
Client Configuration
Add Intendant to your MCP client’s config. For Claude Code (~/.claude/claude_desktop_config.json):
{
"mcpServers": {
"intendant": {
"command": "intendant",
"args": ["--mcp", "Your task description here"]
}
}
}
Tools
All tools mirror TUI actions. The server enforces compile-time parity — adding a new user action to the TUI requires implementing it in the MCP server (and vice versa).
| Tool | Description | Parameters |
|---|---|---|
get_status | Current status: provider, model, turn, budget, phase, autonomy, verbosity, tokens | — |
get_logs | Log entries with cursor-based pagination and level filtering | since_id?, level_filter?, limit? |
get_pending_approval | Current pending approval request (or null) | — |
get_pending_input | Current pending human question (or null) | — |
approve | Approve a pending command (TUI: y) | id |
deny | Deny a pending command and stop (TUI: n) | id |
skip | Skip a pending command, continue (TUI: s) | id |
approve_all | Approve and set autonomy to Full (TUI: a) | id |
respond | Answer an askHuman question (TUI: type + Enter) | text |
set_autonomy | Set autonomy level (TUI: +/-) | level: "low", "medium", "high", "full" |
set_verbosity | Set log verbosity (TUI: v) | level: "quiet", "normal", "verbose", "debug" |
quit | Shut down the agent (TUI: q) | — |
start_task | Start a new agent task | task |
schedule_controller_restart | Schedule a controller restart/autonomous re-init workflow | controller_id, north_star_goal, reason?, restart_after?, restart_command?, auto_start_task?, max_attempts?, cooldown_sec? |
controller_turn_complete | Final handshake from controller; validates token and executes scheduled restart | restart_id, turn_complete_token, status?, handoff_summary? |
get_restart_status | Get current controller restart state (or null) | — |
cancel_controller_restart | Cancel scheduled restart | restart_id? |
request_controller_loop_halt | Request loop halt | persistent? |
clear_controller_loop_halt | Clear loop halt flags so restarts can proceed again | — |
intervene_controller_loop | Request intervention for active loop process | mode: "stop" or "abort" |
get_controller_loop_status | Unified loop health snapshot | — |
reload | Rebuild binary and hot-reload the MCP server via exec() | — |
schedule_controller_restart, controller_turn_complete, and cancel_controller_restart return JSON payloads with an ok boolean and status fields. Rejections are returned as JSON (ok: false) with an error message instead of plain text.
Hot Reload
The reload tool rebuilds the binary from source (cargo build --release) and replaces the running MCP server process in-place using exec(). The MCP connection survives seamlessly — no Claude Code restart needed.
How it works:
reloadrunscargo build --releasein the project directory- After sending the tool response, the process calls
exec()to replace itself with the new binary - The new process detects
INTENDANT_MCP_RELOAD=1and uses aReloadTransportthat injects a synthetic MCP initialization handshake - Claude Code continues using the same connection — the stdio file descriptors survive
exec()
This is particularly useful during development: edit code, call reload, and the MCP server picks up all changes without losing the connection.
Resources
Resources provide push-based state observation via subscriptions. The server sends notifications/resources/updated when state changes, so clients know to re-fetch.
| URI | Description |
|---|---|
intendant://status | Provider, model, turn count, budget %, phase, autonomy, session ID, task |
intendant://usage | Per-model token usage: tokens used, context window, usage % (main + optional presence) |
intendant://logs | Last 100 chronological log entries (same as TUI log panel) |
intendant://pending-approval | Current pending approval request, if any |
intendant://pending-input | Current pending human question, if any |
intendant://controller-restart | Current controller restart workflow state, if any |
intendant://controller-loop | Loop health snapshot (intervention flags, singleton lock owner, active wrapper/codex PIDs, latest run pointers) |
Controller Restart Workflow
Use this when you want Intendant to trigger a controller re-init cycle safely.
- Call
schedule_controller_restartand capturerestart_id+turn_complete_token. - Before ending the controlling agent turn, call
controller_turn_completewith both values. - Intendant executes restart actions:
- spawn
restart_command(if provided), and/or - start a fresh Intendant task using
north_star_goal(auto_start_task=falseby default; opt in for E2E testing).
- spawn
- Inspect state via
get_restart_statusorintendant://controller-restart.
Notes
- Restart state is persisted to the current session dir as
controller_restart.json. restart_afterdefaults to"turn_end".restart_afteraccepts only"turn_end"or"now"; other values are rejected.- Restart workflow string inputs are normalized (trimmed) before validation/execution.
restart_command, when provided, must not be empty/whitespace.- At least one restart action is required at schedule time: set
restart_commandand/orauto_start_task=true. max_attemptsmust be>= 1;0is rejected.- Optional
status,handoff_summary, and cancelrestart_idguard treat whitespace-only values as unset. - If
restart_after="now"and execution fails after passing validation,schedule_controller_restartreports"ok": falseand includesexecution_error. schedule_controller_restartalways reports"phase"from persisted restart state; forrestart_after="now"this reflects the post-execution phase ("completed"or"failed").- Any restart execution failure (including
auto_start_tasklaunch errors) updates persisted restart state to"phase": "failed"and populateslast_error. schedule_controller_restartrejection payloads use"status": "rejected"and include"error"(plus"restart_id"/"phase"when a conflicting active restart exists).controller_turn_completereports JSON results:- success:
"status": "completed","ok": true, plus"execution"and"phase". - rejection/pending:
"ok": false, with"status"("rejected"or"restart_pending") and"error".
- success:
controller_turn_completeonly accepts restarts in"awaiting_turn_complete"; duplicate or late handshakes (for example"phase": "ready") are rejected to prevent duplicate restart execution.cancel_controller_restartreports JSON results:- success:
"status": "cancelled","ok": true, plus"restart_id"and"phase": "cancelled". - rejection:
"status": "rejected","ok": false, with"error"(and optional"restart_id"/"phase"context).
- success:
request_controller_loop_halt,clear_controller_loop_halt,intervene_controller_loop, andget_controller_loop_statusreturn/emit normalized loop health data (flags, lock owner PID/aliveness, latest run pointers, and active PID counts).- Control-socket
command_result.datamirrors structured payloads for restart actions and loop-control actions. get_restart_statusandintendant://controller-restartredactturn_complete_tokenas"[redacted]"; onlyschedule_controller_restartreturns the raw token for the final handshake call.
Controller Recursion Profile
Recommended for Codex/Claude-style controllers:
- Set
auto_start_task=false(or omit it, sincefalseis the default). - Use
restart_commandto relaunch the external controller process. - Treat
start_taskas optional E2E testing only, not the default recursion path.
Controller Loop Monitoring
Controller loop monitoring files (for restart_command scripts):
- Write run artifacts under
.intendant/controller-loop/<run_id>/. - Maintain stable pointers:
.intendant/controller-loop/latest(symlink to current/latest run).intendant/controller-loop/latest.pid(wrapper script PID).intendant/controller-loop/latest.status.json(latest status snapshot).intendant/controller-loop/latest.jsonl(path to latest JSONL output file).intendant/controller-loop/active.lock/(singleton lock:pid,run_id,acquired_at)
- Recommended commands:
tail -f .intendant/controller-loop/latest/codex.jsonlwatch -n 2 'cat .intendant/controller-loop/latest/heartbeat.txt'cat .intendant/controller-loop/latest.status.json
- Intervention controls:
- Halt future loop cycles (persistent):
touch .intendant/controller-loop/request_halt - Halt future loop cycles (legacy marker, consumed once):
touch .intendant/controller-loop/request_halt_after_cycle - Graceful stop current run:
touch .intendant/controller-loop/request_stop - Immediate abort current run:
touch .intendant/controller-loop/request_abort - Intervention history:
cat .intendant/controller-loop/latest/intervention.log
- Halt future loop cycles (persistent):
- Per-run PID files:
.intendant/controller-loop/<run_id>/wrapper.pid.intendant/controller-loop/<run_id>/codex.pid
Typical Agent Workflow
- Call
get_statusto see the current phase and budget - Poll
get_logswithsince_idto stream new events - When an approval is needed,
get_pending_approvalreturns the command preview — callapprove,deny, orskip - When
askHumantriggers,get_pending_inputreturns the question — callrespondwith your answer - Call
quitwhen done
MCP Client
Intendant can also act as an MCP client, connecting to external MCP servers configured in intendant.toml. This lets agents use tools from external servers (filesystem, GitHub, databases, etc.) alongside Intendant’s native tools.
Configuration
[[mcp_servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
[[mcp_servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
[mcp_servers.env]
GITHUB_TOKEN = "ghp_..."
How It Works
At startup, McpClientManager connects to all configured servers via child process transport, discovers their tools, and registers them with the mcp__<server>_<tool> naming convention. For example, a filesystem server’s read_file tool becomes mcp__filesystem_read_file.
Tool calls with the mcp__ prefix are routed through the MCP client manager to the appropriate server. If a server fails to connect at startup, it is skipped with a warning — other servers and native tools continue to work.
Integrations
This chapter covers the control socket (Unix domain socket) and web gateway (WebSocket) integration points. For the MCP server interface, see MCP Server. For the presence layer that mediates user interaction, see Presence Layer.
Control Socket
When --control-socket is enabled, a Unix domain socket is created at /tmp/intendant-<pid>.sock. This enables programmatic control of a running Intendant instance from external scripts and tools.
- Outbound event broadcast to all connected clients
- Inbound command handling for status, approval, denial, human input, autonomy change, quit, controller-restart workflow commands, and controller-loop intervention commands (in MCP mode)
- Socket server is opt-in via
--control-socket
Inbound Commands (JSON-line)
{"action": "status"}
{"action": "approve", "id": 123}
{"action": "deny", "id": 123}
{"action": "input", "text": "answer to askHuman"}
{"action": "set_autonomy", "level": "high"}
{"action": "schedule_controller_restart", "controller_id":"codex", "north_star_goal":"audit and improve", "restart_after":"turn_end"}
{"action": "controller_turn_complete", "restart_id":"<id>", "turn_complete_token":"<token>", "status":"ok", "handoff_summary":"..."}
{"action": "get_restart_status"}
{"action": "cancel_controller_restart", "restart_id":"<id>"}
{"action": "request_controller_loop_halt", "persistent": true}
{"action": "clear_controller_loop_halt"}
{"action": "intervene_controller_loop", "mode":"stop"}
{"action": "get_controller_loop_status"}
{"action": "query_detail", "scope": "diff"}
{"action": "query_detail", "scope": "file", "target": "src/main.rs"}
{"action": "recall_memory", "keywords": ["auth", "login"], "channel": "project_state"}
{"action": "usage"}
{"action": "quit"}
Outbound Events (streamed to connected clients)
{"event": "turn_started", "turn": 5, "budget_pct": 12.3}
{"event": "agent_output", "stdout": "...", "stderr": "..."}
{"event": "approval_required", "id": 123, "command": "rm -rf /tmp/test"}
{"event": "ask_human", "question": "Which database?"}
{"event": "task_complete", "reason": "done signal"}
{"event": "status", "turn": 3, "phase": "thinking", "autonomy": "medium", "session_id": "abc-123", "task": "fix tests"}
{"event": "usage", "main": {"provider": "openai", "model": "gpt-5", "tokens_used": 12000, "context_window": 128000, "usage_pct": 9.4}}
{"event": "usage_update", "main": {"provider": "openai", "model": "gpt-5", "tokens_used": 15000, "context_window": 128000, "usage_pct": 11.7}}
{"event": "command_result", "action": "get_restart_status", "ok": true, "message": "ok", "data": {...}}
- The
statusevent now includessession_idandtaskfields. - The
usageevent is a response to{"action": "usage"}, returning per-model token usage. - The
usage_updateevent is broadcast automatically after each agent turn, providing streaming token consumption updates. Thepresencefield is included when the presence layer is active.
command_result.ok is false when a control action fails (for example, schedule_controller_restart with restart_after="now" and no executable restart action configured).
Example Usage
echo '{"action":"status"}' | socat - UNIX:/tmp/intendant-$(pgrep intendant).sock
Web Gateway
The --web flag starts a web server that serves the app dashboard and bridges WebSocket connections to the EventBus. --web implies --mcp, so no initial task is required — the agent starts idle and accepts tasks dynamically.
See Web Dashboard for the full dashboard documentation and Presence Layer for details on the presence session protocol and mutual exclusion.
How It Works
Browser ──WebSocket──> Intendant web gateway (port 8765)
│ │
│ Terminal I/O (ANSI) │ Events (broadcast to all clients)
│ Key/resize input │ Tool responses (per-connection direct channel)
│ Tool requests │ State snapshot + log replay (on connect)
│ presence_connect/disconnect │ Presence welcome (on voice connect)
│ Voice logs/checkpoints │ Per-connection TUI frames
│ Audio for transcription │
v v
App dashboard (WASM) EventBus + AgentStateSnapshot
+ │
Optional: browser-side │ Dual outbound channels:
live model (Gemini/OpenAI) │ - broadcast::Receiver (events)
│ │ - mpsc::unbounded (direct responses)
│ (function calls → tool_request)
v
Intendant agent loop
The web gateway has three layers:
-
App dashboard — The primary web interface at
/with 4 tabs (Activity, Usage, Terminal, Displays). State management is handled bypresence-webWASM. Events are broadcast and late-connecting browsers get a full log replay. -
Per-connection TUI rendering — Each WebSocket connection gets its own
WebTuiinstance with independent terminal dimensions. ANSI output is sent per-connection via the direct channel, not broadcast. -
Presence bridge (optional) — When a browser connects a live model (Gemini Live / OpenAI Realtime), the model uses 9 presence tools that map to
tool_requestWebSocket messages. The gateway handles these server-side and returnstool_responsemessages on the per-connection direct channel.
WebSocket Protocol
Inbound Messages (browser → server)
| Message | Description |
|---|---|
{"t":"key","key":"..."} | Keyboard input (routed to per-connection WebTui) |
{"t":"resize","cols":N,"rows":N} | Terminal resize (per-connection) |
{"t":"presence_connect",...} | Presence session protocol — replaces server-side presence |
{"t":"presence_disconnect"} | Disconnect presence — resumes server-side presence |
{"t":"make_active"} | Request active voice ownership (handover) |
{"t":"voice_log","text":"...","seq":N} | Voice transcript from browser presence model |
{"t":"presence_checkpoint","summary":"...","last_event_seq":N} | Context checkpoint |
{"t":"voice_diagnostic","kind":"...","detail":"..."} | Browser voice diagnostics |
{"t":"user_audio","data":"<base64>"} | PCM16 audio for server-side transcription |
{"t":"tool_request","id":"...","tool":"...","args":{}} | Presence tool call |
{"t":"async_query","id":"...","tool":"...","args":{}} | Async query (result as text, not tool response) |
{"action":"..."} | ControlMsg (same as Unix control socket) |
{"t":"live_connected"} / {"t":"live_disconnected"} | Legacy (still accepted) |
Outbound Messages (server → browser)
| Message | Description |
|---|---|
{"t":"term","d":"<base64>"} | Per-connection TUI ANSI output |
{"t":"state_snapshot","state":{...},"connection_id":"...","config":{...},"session_id":"..."} | Bootstrap on connect |
{"t":"log_replay","entries":[...]} | Historical session events for late-connecting browsers |
{"t":"presence_welcome","session_id":"...","state":{...},"events":[...],"is_active":bool,"conversation_context":"..."} | Presence session welcome |
{"t":"active_granted","is_active":true,"handover_context":"...","conversation_context":"..."} | Active ownership granted |
{"t":"force_disconnect_voice","reason":"handover"} | Sent to old active on handover |
{"t":"presence_checkpoint_ack","seq":N} | Checkpoint acknowledgement |
{"t":"tool_response","id":"...","result":"..."} | Response to a tool_request |
{"t":"async_query_result","id":"...","tool":"...","result":"..."} | Response to async_query |
{"event":"..."} | OutboundEvent broadcast (status, agent_output, approval_required, etc.) |
Tool Request/Response Protocol
The browser live model calls presence tools via tagged request/response messages:
// Browser sends:
{"t":"tool_request","id":"req-42","tool":"check_status","args":{}}
// Server responds (on direct channel):
{"t":"tool_response","id":"req-42","result":"Phase: Running agent (turn 5). Budget: 23% used."}
Action tools (submit_task, approve_action, deny_action, skip_action, respond_to_question, set_autonomy) are dispatched via the EventBus — the same path as TUI key presses and control socket commands.
Query tools (check_status, query_detail, recall_memory) are handled asynchronously server-side via presence::handle_tool_query(), which reads from the shared AgentStateSnapshot, project files, and knowledge store.
State Bootstrap
On WebSocket connect, the server sends multiple bootstrap messages:
state_snapshot— FullAgentStateSnapshotwithconnection_id, config, andsession_id- Cached
usage_update— Latest token usage data - Cached
status— Latest status (autonomy, session_id, task) - Cached
display_ready— Latest display info for VNC slots log_replay— Historical session events parsed fromsession.jsonl
This ensures late-connecting browsers see the complete state immediately.
HTTP Endpoints
| Endpoint | Description |
|---|---|
GET / | App dashboard (4-tab UI: Activity, Usage, Terminal, Displays) |
GET /config | Live model configuration JSON |
GET /debug | Debug JSON (agent state, voice connection, active browser) |
POST /session | Mint ephemeral session tokens for Gemini Live / OpenAI Realtime |
GET /wasm-web/* | WASM and JS glue (content-hash cache-busted) |
GET /audio-processor.js | AudioWorklet processor for microphone capture |
WS / | Main WebSocket (events, terminal I/O, presence protocol) |
WS /vnc | WebSocket-to-TCP VNC proxy for noVNC display viewing |
Requirements
- Microphone access requires a secure context: Use
localhost(via SSH tunnel:ssh -L 8765:localhost:8765 host), or set browser flags for insecure origins. - API key for voice: Gemini or OpenAI. The key is used browser-side only. Voice is optional — the dashboard works without it.
Supported Tools (Browser Live Model)
| Tool | Type | Description |
|---|---|---|
submit_task | Action | Submit a new task to the agent loop |
approve_action | Action | Approve a pending action |
deny_action | Action | Deny a pending action |
skip_action | Action | Skip a pending action |
respond_to_question | Action | Answer an askHuman question |
set_autonomy | Action | Change autonomy level |
check_status | Query | Get current agent phase, turn, budget |
query_detail | Query | Get git diff, file contents, or log details |
recall_memory | Query | Search the knowledge store by keywords/channel |
Session Logging
Overview
Each intendant invocation creates a structured session log directory at ~/.intendant/logs/<uuid>/. The log provides full observability for debugging and post-session analysis. No global state files are used — each session is fully isolated.
Directory Structure
~/.intendant/logs/<uuid>/
├── session_meta.json # Session metadata (id, created_at, project_root, task, status, last_turn)
├── session.jsonl # Structured event log (one JSON per line)
├── conversation.jsonl # Serialized conversation for session resume
├── summary.json # Post-session summary (task, outcome, turns)
├── human_question # askHuman IPC: question file (session-scoped)
├── human_response # askHuman IPC: response file (session-scoped)
├── 1_stdout.log # Runtime stdout for nonce 1
├── 1_stderr.log # Runtime stderr for nonce 1
└── turns/
├── turn_001_messages.json # Full messages array sent to API
├── turn_001_model.txt # Full model response
├── turn_001_reasoning.txt # Full reasoning content (if available)
├── turn_001_agent_in.json # Commands sent to runtime (pretty-printed)
├── turn_001_stdout.txt # Agent stdout for this turn
└── turn_001_stderr.txt # Agent stderr (only if non-empty)
Session Metadata
session_meta.json contains:
{
"session_id": "a1b2c3d4-...",
"created_at": "2025-01-15T10:30:00Z",
"project_root": "/home/user/myproject",
"task": "Fix the authentication bug",
"role": null,
"status": "running",
"last_turn": 5
}
This file is used by --continue (find most recent session for the project) and --resume <id> (find session by ID or prefix).
Event Types in session.jsonl
| Event | Description |
|---|---|
session_start | Session initialization |
turn_start | Turn boundary with budget % and remaining tokens |
messages_input | Full API input logged (file reference to messages.json) |
model_response | Model output with token counts (200-char preview, full in file) |
reasoning | Reasoning summary and full content (if available from API) |
json_extracted | Extracted command JSON with function names |
agent_input | Commands sent to runtime |
agent_output | Runtime stdout/stderr |
approval | Approval decisions (category, preview, decision) |
context_management | Auto-compaction or manual context directive |
session_end | Summary with outcome and turn count |
Querying Logs
# Overview of a session
cat ~/.intendant/logs/<session>/session.jsonl | jq -r '.event'
# See what the model received on turn 5
cat ~/.intendant/logs/<session>/turns/turn_005_messages.json | jq .
# See model reasoning on turn 3
cat ~/.intendant/logs/<session>/turns/turn_003_reasoning.txt
# Find all commands executed
grep '"event":"agent_input"' ~/.intendant/logs/<session>/session.jsonl | jq -r '.message'
# List all sessions
ls -lt ~/.intendant/logs/
# Find sessions for a specific project
grep -l '"project_root":"/home/user/myproject"' ~/.intendant/logs/*/session_meta.json
Session Resume
Conversation history is saved to conversation.jsonl after each turn, enabling session resume:
# Resume most recent session for this project
./target/release/intendant --continue "fix that bug"
# Resume specific session by ID or prefix
./target/release/intendant --resume abc123 "continue"
When resuming, the conversation is loaded from conversation.jsonl and the agent continues from where it left off. Session metadata is updated with the new task.
Test Coverage
The test suite covers both binaries with inline #[cfg(test)] modules:
- Agent binary: models serialization, error types, process state operations, nonce replacement, path inspection, blocking command execution, file editing, browsing, port waiting, human interaction, PTY sessions, memory storage/recall with tags and filters.
- Caller binary: JSON extraction, done signal handling, conversation management with message layer protection, tool call tracking, and auto-compaction, context directives (drop/summarize), error types, project detection, config parsing with approval rules and MCP server config and sandbox config, provider selection with token usage tracking, Responses API support, rate-limit retry with exponential backoff, API key masking, SSE streaming and event parsing, shared message builders, structured output and reasoning controls, role mapping, native tool definitions (11+ tools including MCP client tools, provider conversion formats), tool call batch assembly and result routing (including MCP tool routing), Gemini provider request/response format, sub-agent spawning and result parsing, git worktree lifecycle, user mode orchestration, knowledge pub/sub system, prompt resolution cascade (project root, global config, compiled-in defaults, tools-mode variant) with INTENDANT.md loading, TUI rendering (status bar, log panel, action panel, approval panel, help overlay, layout calculations, orchestrator progress, streaming buffer), autonomy level resolution and command classification, event bus dispatch, theme color thresholds, control socket serialization, session log file creation, model summary formatting, Xvfb display configuration per provider, dynamic display allocation, MCP client tool name parsing and routing, Landlock sandbox config construction, JSON structured output mode, web gateway (WebSocket lifecycle, tool request/response, broadcast, state bootstrap, live connect/disconnect), and presence event filtering.
Integration tests in tests/e2e/ spawn a real binary and exercise the full stack (see Architecture):
- Tier 1 (JSON mode): Full-stack exec, approval approve/deny via stdin, multi-round follow-up. No display required.
- Tier 2 (Control socket): Status/usage queries, autonomy change, approve via Unix control socket. Requires Xvfb.
- Tier 3 (Web/Voice): WebSocket state_snapshot, tool_request/response, ANSI term frames, /debug endpoint. Voice tests require Firefox, PulseAudio, and espeak-ng.