Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Getting Started

Building

cargo build --release

Two binaries are produced:

  • ./target/release/intendant-runtime — the command runtime
  • ./target/release/intendant — the AI CLI/TUI/Web

Installing

cargo install --path .

Both binaries are installed to ~/.cargo/bin/. The intendant binary embeds default system prompts and web assets (HTML, WASM) at compile time, so it works immediately from any directory without needing the source tree.

Prerequisites

  • Rust toolchain (stable)
  • wasm-packcargo install wasm-pack (auto-rebuilds WASM on source changes)
  • ffmpeg — required for display recording (brew install ffmpeg / apt install ffmpeg)
  • macOS: ./scripts/setup-macos.sh installs all platform dependencies
  • Linux: sudo apt install imagemagick xdotool xvfb x11vnc ffmpeg

WASM auto-rebuild

The build.rs script automatically rebuilds WASM when crates/presence-web/ or crates/presence-core/ source files change. This requires wasm-pack to be installed. If not installed, cargo build prints a warning and skips the WASM rebuild.

To rebuild manually:

cd crates/presence-web && wasm-pack build --target web --out-dir ../../static/wasm-web --out-name presence_web
cargo build --release -p intendant   # Re-embed WASM

Setup

Create a .env file or export the variables. The caller searches for .env in this order:

  1. Current directory (and parent directories)
  2. Project root (git root)
  3. Global config (~/.config/intendant/.env)

For global use after cargo install, put your keys in ~/.config/intendant/.env:

# OpenAI
OPENAI_API_KEY=sk-...

# Or Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# Or Gemini (Google AI)
GEMINI_API_KEY=AI...

# If multiple keys are set, choose one:
PROVIDER=openai          # or "anthropic" or "gemini"

MODEL_NAME=gpt-5.2-codex # optional, provider-specific default used if omitted

# Disable native tool calling (fall back to text-based JSON extraction)
# USE_NATIVE_TOOLS=false

Running

# With a task as CLI argument (launches TUI)
./target/release/intendant "List the files in /tmp"

# Headless mode (no TUI, plain text output)
./target/release/intendant --no-tui "List the files in /tmp"

# With autonomy level
./target/release/intendant --autonomy low "rm -rf /tmp/test"

# Specify provider and model
./target/release/intendant --provider anthropic --model claude-sonnet-4-5-20250929 "List files"

# Use Gemini provider
./target/release/intendant --provider gemini --model gemini-2.5-pro "List files"

# Interactive mode (prompts for task on stdin)
./target/release/intendant

# Verbose output (show debug-level log entries)
./target/release/intendant --verbose "echo hello"

# JSONL structured output (implies --no-tui)
./target/release/intendant --json "echo hello"

# Resume most recent session for this project
./target/release/intendant --continue "fix that bug"

# Resume specific session by ID or prefix
./target/release/intendant --resume abc123 "continue"

# Force single-agent mode (skip orchestrator)
./target/release/intendant --direct "simple task"

# Web dashboard (Activity + Usage + Terminal + Displays, default port 8765)
./target/release/intendant --web

# Web dashboard on custom port
./target/release/intendant --web 9000

# Enable filesystem sandboxing (Landlock, Linux 5.13+)
./target/release/intendant --sandbox "run tests"

# Run as MCP server (stdio transport)
./target/release/intendant --mcp "Deploy the application"

# Enable Unix control socket
./target/release/intendant --control-socket "task"

# Disable the presence layer
./target/release/intendant --no-presence "task"

# Pipe input (auto-detects non-TTY, runs headless)
echo "task" | ./target/release/intendant

Testing

cargo test --bins         # Unit tests (fast, no API keys needed)
cargo test -- --list      # List all test names

The test suite covers both binaries with inline #[cfg(test)] modules. See Session Logging for the full test coverage summary.

Integration tests in tests/e2e/ spawn a real binary and make real API calls — see Architecture for details.

Architecture

Overview

Intendant is a two-binary system: a command runtime that executes work, and a caller that drives it via AI model APIs.

stdin (JSON) --> intendant-runtime --> executes commands sequentially (blocking)
                  |
                  +--> in-memory process state (HashMap<nonce, ProcessInfo>)
                  +--> $INTENDANT_LOG_DIR/  (stdout/stderr logs per nonce)
                  |
                  +--> stdout (result lines with exit code, stdout/stderr tail)

intendant (3 modes) --> detects project root (git) --> loads memory/knowledge/skills
  |
  +--> User Mode:       spawns orchestrator subprocess, monitors progress (no API calls)
  +--> Sub-Agent Mode:  scoped task, writes results/progress, isolated context
  +--> Direct Mode:     single-loop execution for simple tasks
  |
  +--> Presence layer:  conversational mediator between user and agent loop
  +--> Native tool calling (OpenAI/Anthropic/Gemini) with text extraction fallback
  +--> Streaming output:  SSE-based token streaming for all 3 providers
  +--> Ratatui TUI:     status bar, scrollable log, approval panel, askHuman input
  +--> Web dashboard:   4-tab app (Activity/Usage/Terminal/Displays) with WASM-driven state
  +--> Live voice:      Gemini Live / OpenAI Realtime via browser, active/passive multi-browser
  +--> MCP Server:      --mcp flag, stdio transport, full parity with TUI (tools + resources)
  +--> MCP Client:      connects to external MCP servers (configured in intendant.toml)
  +--> Autonomy system: Low/Medium/High/Full + per-category rules from intendant.toml
  +--> Skills system:   SKILL.md-based instruction sets with YAML frontmatter
  +--> Transcription:   server-side Whisper API for browser audio transcription
  +--> Landlock sandbox: filesystem restrictions on agent runtime (Linux)
  +--> Prompt caching:  Anthropic cache_control, OpenAI/Gemini implicit caching
  +--> Auto-compaction: triggers at 90% context usage, preserves system+tail messages
  +--> Control socket:  /tmp/intendant-<pid>.sock (JSON-line protocol)
  +--> VNC proxy:       WebSocket-to-TCP bridge for noVNC display viewing
  +--> Token budget tracking (context-window-aware loop termination)
  +--> Session resume:  --continue (most recent) or --resume <id> (specific session)
  +--> Git worktree isolation for implementation agents
  +--> Tagged knowledge store with pub/sub channels between agents

Process State

In-memory HashMap<u64, ProcessInfo> tracking nonce, PID, status, exit code, and timestamp. Ephemeral — does not survive binary restarts. Each runtime invocation starts with an empty process map.

Session Directory

Per-session directory at ~/.intendant/logs/<uuid>/ with UUID-based naming. Contains per-nonce stdout/stderr log files, structured session logs (session.jsonl), conversation history (conversation.jsonl), and askHuman IPC files. The log directory is passed to the runtime via INTENDANT_LOG_DIR.

Execution Model

Commands are processed sequentially. Each command blocks until completion and returns its result directly (exit code, stdout tail, stderr tail). The runtime exits after processing all commands. Daemons backgrounded in bash continue after the tool returns.

Execution Modes

intendant operates in one of three modes, selected automatically based on task complexity and environment:

Direct Mode

Activated for simple tasks, or forced with --direct:

  • Single-loop execution with the selected model
  • Budget-aware loop: stops at context exhaustion, done signal, or 500-turn safety cap
  • Used for short tasks that don’t need multi-agent orchestration

User Mode

Activated for complex tasks without INTENDANT_ROLE:

  • Pure subprocess monitor — makes zero model API calls at Layer 0
  • Spawns an orchestrator sub-agent as a child process via tokio::process::Command
  • Polls the orchestrator’s progress file every 500ms, relays status to the TUI or stdout
  • Reads the orchestrator’s result file on exit; synthesizes a failure if the process crashes
  • kill_on_drop(true) ensures the orchestrator is terminated if the user quits the TUI

Sub-Agent Mode

Activated when INTENDANT_ROLE env var is set:

  • Runs as a child agent with a scoped task
  • Writes periodic progress to INTENDANT_PROGRESS_FILE
  • Writes final results (summary, findings, artifacts, token usage) to INTENDANT_RESULT_FILE
  • Uses role-specific system prompts (SysPrompt_research.md, SysPrompt_implementation.md, etc.)

See Multi-Agent Orchestration for the full sub-agent architecture.

How It Works (Direct Mode)

  1. Loads .env and selects the API provider (OpenAI, Anthropic, or Gemini). OpenAI uses the Responses API (/v1/responses), Anthropic uses the Messages API, Gemini uses the generateContent endpoint. All providers support streaming via SSE
  2. Configures structured output (JSON mode), reasoning controls, native tool calling, prompt caching (Anthropic cache_control), and max output tokens based on model capabilities and env vars
  3. Detects the project root (via git rev-parse --show-toplevel, falls back to cwd)
  4. Resolves role-appropriate system prompt via cascade: project root → ~/.config/intendant/ → compiled-in default. When native tools are enabled, uses the condensed SysPrompt_tools.md (tool docs live in API tool definitions instead of prose)
  5. Injects the project working directory into the conversation so the model knows which project to work in
  6. Loads knowledge from <project>/.intendant/memory.json, injects into conversation
  7. Loads INTENDANT.md project instructions (global then project-local), injects into conversation
  8. Logs the full messages array to turn_NNN_messages.json before each API call
  9. Sends the task to the chat API via streaming (chat_stream()), with max_tokens/max_output_tokens, optional reasoning, optional JSON format, and native tool definitions when enabled. API requests use exponential backoff retry (up to 5 retries) for rate-limit (429) and server errors (5xx). Text deltas are forwarded to the TUI in real-time
  10. Logs reasoning content (both summary and full text) to turn_NNN_reasoning.txt when available
  11. Processes the model’s response via one of two paths:
    • Native tool call path (when response contains tool calls): Collects individual tool calls, assembles them into an AgentInput batch, pipes to the runtime, maps results back to per-tool-call responses. Handles manage_context and signal_done tool calls caller-side. Raw API output items (reasoning + function_call) are preserved for verbatim echo-back in subsequent requests, which reasoning models (GPT-5, o3, o4) require
    • Legacy text extraction path (fallback): Extracts JSON from the response text (handles structured output, code fences, and bare JSON), checks for explicit done signal ({"done": true})
  12. Applies context directives (drop_turns, summarize) to the conversation
  13. Injects project context (memory_file) into relevant commands
  14. Classifies commands by action category (file read/write/delete, exec, network, destructive) and checks autonomy rules
  15. If approval is required:
    • TUI mode: emits an approval request and waits for user response
    • Headless mode: denies execution (no implicit auto-approve fallback)
  16. Pipes the JSON to the intendant-runtime binary and waits for completion with a hard timeout (120s default, 600s for askHuman)
  17. Feeds the agent output back as the next user message (text path) or as individual tool results (tool call path), appending a token budget summary
  18. Repeats until the model signals done, responds with no JSON, or the context budget is exhausted
  19. In headless mode, if the model emits askHuman, the loop sends a recovery prompt back to the model (continue with explicit assumptions) instead of blocking on human-input timeout

askHuman Behavior

  • In TUI mode, askHuman opens the input panel and writes your answer to the session-scoped response file.
  • Empty submit is rejected in the TUI; you must provide non-empty input or press Esc to cancel.
  • In headless mode (--no-tui or non-interactive stdin), askHuman cannot be answered interactively. The loop tells the model to continue with explicit assumptions instead of waiting for the runtime timeout.
  • Runtime-level timeout for unanswered askHuman remains 5 minutes.

Streaming

All three providers support streaming via chat_stream() on the ChatProvider trait:

  • Anthropic: stream: true on Messages API, parses content_block_delta, content_block_start/stop, message_delta
  • OpenAI: stream: true on Responses API, parses response.output_text.delta, response.function_call_arguments.delta, response.completed
  • Gemini: streamGenerateContent?alt=sse endpoint, parses chunked JSON candidates

Text deltas are forwarded to the TUI via AppEvent::ModelResponseDelta and accumulated in App::streaming_buffer, which is cleared when the full ModelResponse arrives.

Rate-Limit Retry

API requests use send_with_retry() with exponential backoff (1s × 2^attempt + jitter, up to 5 retries) for HTTP 429 and 5xx responses. Non-retryable errors (400, 401, etc.) fail immediately. API keys in error messages are masked via mask_api_keys().

Prompt Caching

  • Anthropic: Uses anthropic-beta: prompt-caching-2024-07-31 header with structured system content containing cache_control: {"type": "ephemeral"}
  • OpenAI: Automatic server-side caching for prompts >1024 tokens (no API changes needed)
  • Gemini: Implicit context caching (no API changes needed)

Auto-Compaction

When context usage reaches 90% (usage_fraction() >= 0.90), conversation.auto_compact() triggers:

  • Keeps: system message, first 2 context messages, last 4 messages
  • Summarizes: oldest half of remaining middle messages via summarize_turns()
  • Emits ContextManagement event to TUI/MCP

Vision / Display Management

Xvfb is auto-launched lazily on the first turn containing an execAsAgent or captureScreen command when no accessible X display exists (checked via xdpyinfo). The detection flow:

  1. Already launched? → skip
  2. Batch contains execAsAgent or captureScreen? No → skip
  3. Current DISPLAY accessible? Yes → skip (user has working display)
  4. Auto-launch Xvfb, store guard, set DISPLAY, emit DisplayReady event
  5. On failure → log warning, let captureScreen fail naturally

Display allocation prefers :99 for a predictable VNC port (5999). If :99 is locked by a live Xvfb from a previous session, it is automatically killed and reclaimed (detected via /proc/<pid>/cmdline). If :99 is held by a non-Xvfb process, allocation falls through to :100+.

Per-provider display resolutions:

  • OpenAI: 1024×768 (3 tiles of 512×512)
  • Anthropic: 819×1456 (9:16 aspect)
  • Gemini: 768×1024 (2 tiles of 768×768)

VNC Remote Observation

An x11vnc server is launched alongside Xvfb as a best-effort co-process (port = 5900 + display_id). If x11vnc is not installed, the display works normally. Both Xvfb and x11vnc are killed on drop via XvfbGuard.

On the guest VM — install x11vnc:

sudo apt-get install -y x11vnc

From your host machine — connect with any VNC client:

# Direct connection (VM on local network)
vncviewer <vm-ip>:5999

# Over SSH tunnel (recommended for remote VMs)
ssh -L 5999:localhost:5999 user@vm-host
vncviewer localhost:5999

If display :99 was already taken, intendant falls through to :100+ and the VNC port shifts accordingly (6000, 6001, …). Check the TUI log panel or stderr for the actual port:

22:29:12  VNC server available at vnc://localhost:5999

Environment

  • OS: Linux (Debian 12+)
  • Runtime: Tokio async (full features)
  • Permissions: Runs as unprivileged user with passwordless sudo
  • Display: Auto-managed Xvfb with x11vnc (see above)
  • X11 auth: At startup the runtime discovers active X displays and merges their xauth cookies into a session-scoped session.Xauthority file, passed as XAUTHORITY to all spawned commands

Configuration

CLI Flags

FlagDescription
--provider <name>Force provider (openai, anthropic, or gemini)
--model <name>Override model name
--verbose / -vShow debug-level log entries in TUI
--no-tuiDisable TUI, use plain text output
--autonomy <level>Set autonomy level (low, medium, high, full)
--log-file <dir>Override session log directory
--mcpRun as MCP server on stdio (replaces TUI)
--control-socketEnable Unix control socket at /tmp/intendant-<pid>.sock
--jsonJSONL structured output to stdout (implies --no-tui)
--sandboxEnable Landlock filesystem sandboxing (Linux kernel 5.13+)
--directForce single-agent direct mode (skip orchestrator even for complex tasks)
--no-presenceDisable the presence layer (direct agent interaction)
--continue / -cResume most recent session for this project
--resume <id> / -r <id>Resume specific session by ID or prefix
--web [PORT]Start web dashboard with Activity/Usage/Terminal/Displays tabs + optional voice (default port 8765)
--transcriptionEnable server-side audio transcription (Whisper API)

The TUI launches only when both stdin and stdout are terminals. When piping input/output or in sub-agent mode, intendant falls back to headless mode.

Environment Variables

VariableDefaultDescription
OPENAI_API_KEY / OPENAIOpenAI API key
ANTHROPIC_API_KEY / ANTHROPICAnthropic API key
GEMINI_API_KEYGoogle AI (Gemini) API key
PROVIDERauto-detect"openai", "anthropic", or "gemini" (used when multiple keys are set)
MODEL_NAMEper-provider defaultModel to use (e.g. gpt-5.2-codex, claude-sonnet-4-5-20250929, gemini-2.5-pro)
USE_NATIVE_TOOLStrueEnable native API tool calling; false falls back to text-based JSON extraction
MODEL_CONTEXT_WINDOWper-model defaultContext window size in tokens
MAX_OUTPUT_TOKENSper-model defaultMax output tokens per API call (sent to API)
STRUCTURED_OUTPUTtrue for gpt-5+/o3/o4Enable JSON object mode for deterministic parsing
REASONING_EFFORTReasoning effort for GPT-5/o3/o4 models (low, medium, high)
REASONING_SUMMARYReasoning summary mode (auto, concise, detailed)
PRESENCE_PROVIDEROverride provider for the presence layer (fallback: PROVIDER)
PRESENCE_MODELOverride model for the presence layer
INTENDANT_LOG_DIRautoSession log directory (set automatically by caller for the runtime)

Sub-Agent Environment Variables

These are set automatically when spawning sub-agents (see Multi-Agent Orchestration):

VariableDescription
INTENDANT_ROLESub-agent role (orchestrator, research, implementation, testing)
INTENDANT_IDUnique sub-agent identifier
INTENDANT_TASKTask description for the sub-agent
INTENDANT_RESULT_FILEPath for sub-agent to write final results
INTENDANT_PROGRESS_FILEPath for sub-agent to write periodic progress
INTENDANT_PARENT_KNOWLEDGEPath to parent’s knowledge store for inheritance
INTENDANT_INHERIT_MEMORY1 to inherit project memory
INTENDANT_SANDBOX_WRITE_PATHSLandlock write paths (set by caller when sandboxing)
INTENDANT_MCP_RELOAD1 when process was exec’d for MCP hot-reload

The agent runner hard timeout is 120s default, automatically extended to 600s when askHuman is present in the command batch.

Project Configuration

Create intendant.toml in the project root:

[memory]
enabled = true  # default: true

[model]
context_window = 200000       # override per-model default
max_output_tokens = 8192      # override per-model default

[orchestrator]
max_parallel_agents = 4       # max concurrent sub-agents
sub_agent_dir = ".intendant/subagents"  # where sub-agent workspaces are created

[approval]
file_read = "auto"            # auto-approve file reads
file_write = "ask"            # ask before file writes (default)
file_delete = "ask"           # ask before file deletes (default)
command_exec = "auto"         # auto-approve command execution
network = "auto"              # auto-approve network requests
destructive = "ask"           # ask before destructive commands (default)

[presence]
enabled = true                # enable the conversational presence layer (default: true)
provider = "gemini"           # provider for the presence model (optional, falls back to PROVIDER)
model = "gemini-2.5-flash"    # model for the presence layer (optional)
live_provider = "gemini"      # provider for browser-side live presence (optional)
live_model = "gemini-2.5-flash-native-audio-preview-12-2025"  # model for browser-side live presence (optional)
context_window = 32768        # context window for the presence conversation (default: 32768)

[transcription]
enabled = false               # enable server-side audio transcription (default: false)
provider = "openai"           # transcription provider (default: "openai")
model = "whisper-1"           # transcription model (default: "whisper-1")
language = "en"               # ISO-639-1 language hint (optional, auto-detect if omitted)
# endpoint = "http://..."     # custom endpoint for self-hosted whisper.cpp

[sandbox]
enabled = false               # enable Landlock filesystem sandboxing (default: false)
extra_write_paths = ["/var/log"]  # additional writable paths beyond project root, /tmp, log dir

# External MCP servers to connect to as a client
[[mcp_servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]

[[mcp_servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]

[mcp_servers.env]
GITHUB_TOKEN = "ghp_..."

Skills

Skills are named instruction sets stored as SKILL.md files with YAML frontmatter. They are discovered from two directories (project-scoped first):

  1. <project_root>/.intendant/skills/<name>/SKILL.md
  2. ~/.intendant/skills/<name>/SKILL.md

Example SKILL.md:

---
name: deploy
description: Deploy the application to production
autonomy: high
disable-auto-invocation: true
---

## Steps

1. Run tests
2. Build release binary
3. Deploy to server

Frontmatter fields:

  • name — skill identifier (required)
  • description — shown in skill catalog (required)
  • autonomy — override session autonomy level when active (optional)
  • disable-auto-invocation — if true, only user can trigger this skill (optional, default false)
  • sandbox — override session sandbox setting (optional)

Project skills take precedence over personal skills with the same name. Available skills are formatted into a catalog and injected into the agent’s conversation.

When sandboxing is enabled (via --sandbox or [sandbox].enabled = true), runtime command execution is restricted to read-only filesystem access plus writes to project root, /tmp, session log directory, ~/.intendant, and extra_write_paths. On kernels without Landlock support, sandboxing is silently skipped.

INTENDANT.md Project Instructions

Place an INTENDANT.md file in your project root or at ~/.config/intendant/INTENDANT.md for global instructions. These are injected into the conversation at session start, before knowledge/memory. Both files are loaded if present (global first, project-local second).

System Prompts

System prompts are compiled into the binary at build time, so intendant works from any directory without needing the source tree. Two base prompt variants exist:

  • SysPrompt.md — Full prompt with JSON schema and per-function documentation (used with text-based JSON extraction)
  • SysPrompt_tools.md — Condensed prompt for native tool calling mode (function docs live in API tool definitions, reducing system prompt tokens)

The active variant is selected automatically based on whether the provider has native tool calling enabled.

Prompts are resolved using a 3-layer cascade (highest priority first):

  1. Project root<git-root>/SysPrompt.md or SysPrompt_tools.md (per-project customization)
  2. Global config~/.config/intendant/SysPrompt.md or SysPrompt_tools.md (user-wide customization)
  3. Compiled-in default — always available, zero-config

Role-specific prompts (SysPrompt_orchestrator.md, SysPrompt_research.md, SysPrompt_implementation.md) follow the same cascade and are appended to the base prompt. The presence layer uses its own standalone prompt (SysPrompt_presence.md).

To customize prompts for a specific project, place your modified .md files in the project’s git root. For user-wide customization, place them in ~/.config/intendant/.

Runtime Protocol

The intendant-runtime binary reads a single JSON object from stdin, executes commands sequentially, and writes result lines to stdout.

Basic Usage

echo '{"commands":[{"function":"execAsAgent","nonce":1,"command":"echo hello"}]}' \
  | ./target/release/intendant-runtime

Output is a JSON result line containing the nonce, exit code, stdout tail (last 10KB), and stderr tail.

Inspect a file path:

echo '{"commands":[{"function":"inspectPath","nonce":1,"path":"/etc/hosts"}]}' \
  | ./target/release/intendant-runtime

Edit a file:

echo '{"commands":[{"function":"editFile","nonce":1,"file_path":"/tmp/test.txt","operation":"write","content":"hello"}]}' \
  | ./target/release/intendant-runtime

Fetch a web page as text:

echo '{"commands":[{"function":"browse","nonce":1,"url":"https://example.com"}]}' \
  | ./target/release/intendant-runtime

Run stateful commands in a persistent PTY:

echo '{"commands":[{"function":"execPty","nonce":1,"command":"cd /tmp"},{"function":"execPty","nonce":2,"command":"pwd"}]}' \
  | ./target/release/intendant-runtime

Store and recall memory (supports tagged knowledge with channels):

# Basic store
echo '{"commands":[{"function":"storeMemory","nonce":1,"memory_key":"db-config","memory_summary":"PostgreSQL on port 5432","memory_file":"/path/to/.intendant/memory.json"}]}' \
  | ./target/release/intendant-runtime

# Store with tags and channel
echo '{"commands":[{"function":"storeMemory","nonce":1,"memory_key":"db-config","memory_summary":"PostgreSQL on port 5432","memory_tags":"database,config","memory_channel":"findings","memory_source":"research-1","memory_file":"/path/to/.intendant/memory.json"}]}' \
  | ./target/release/intendant-runtime

# Recall with filters
echo '{"commands":[{"function":"recallMemory","nonce":1,"memory_query":"database","memory_tags":"config","memory_channel":"findings","memory_file":"/path/to/.intendant/memory.json"}]}' \
  | ./target/release/intendant-runtime

Functions

Runtime Functions

FunctionDescriptionKey Fields
execAsAgentRun a bash command (blocks until exit, returns exit code + stdout/stderr tail)command, display, wait_for_port
captureScreenScreenshot a display via ImageMagickdisplay
inspectPathInspect filesystem path metadata (type, size, perms, timestamps)path
editFileStructured file editing without shell commandsfile_path, operation, content, match_content, line_number, end_line
writeFileAlias for editFile with operation: "write" (backward compatibility)file_path, content
browseFetch URL and convert HTML to plain text (50KB max)url
askHumanAsk the operator a question and wait for response (5-minute timeout)question, timeout_ms
execPtyRun command in a persistent PTY session (bash --norc --noprofile)command, shell_id
storeMemoryStore a knowledge entry with optional tags/channelmemory_key, memory_summary, memory_file, memory_tags, memory_channel, memory_source
recallMemorySearch knowledge by keyword with optional filtersmemory_query, memory_file, memory_tags, memory_channel, memory_source, memory_since

Caller-Handled Functions

These are intercepted by the caller and never reach the runtime:

FunctionDescription
manage_contextApply context directives (drop/summarize turns) to the conversation
signal_doneSignal task completion to the caller loop

Native Tool Names

When using native tool calling (the default), tool names use snake_case:

Native NameRuntime Function
exec_commandexecAsAgent
capture_screencaptureScreen
inspect_pathinspectPath
edit_fileeditFile
browse_urlbrowse
ask_humanaskHuman
exec_ptyexecPty
store_memorystoreMemory
recall_memoryrecallMemory
manage_context(caller-handled)
signal_done(caller-handled)

editFile Operations

The editFile function supports 5 operations:

OperationDescriptionRequired Fields
writeWrite content to file (creates or overwrites)file_path, content
appendAppend content to end of filefile_path, content
replaceReplace matching text with new contentfile_path, match_content, content
insert_atInsert content at a specific line numberfile_path, line_number, content
replace_linesReplace a range of linesfile_path, line_number, end_line, content

Nonce Variables

Use $NONCE[id] in command strings to reference the PID of a previously launched nonce. For example, kill -9 $NONCE[10] kills the process started by nonce 10. Handled by regex-based substitution in replace_nonce_refs().

Context Management

The model can include a context field alongside commands to manage conversation history:

{
  "commands": [...],
  "context": {
    "drop_turns": [3, 4, 5],
    "summarize": { "turns": [7, 8, 9, 10], "summary": "Set up nginx with reverse proxy" }
  }
}
  • drop_turns: Remove messages at given indices (system prompt and last 2 messages are protected).
  • summarize: Replace a range of messages with a single summary.
  • Context-only turns (empty commands) are supported for pruning without executing anything.

Knowledge System

Project knowledge persists tagged entries across sessions in <project>/.intendant/memory.json. The system supports both the legacy key-value format and the new tagged knowledge format with automatic migration.

  • storeMemory: Creates or updates an entry with key, summary, tags, channel, and source. Backward-compatible with old format.
  • recallMemory: Searches entries by keyword with optional filters (tags, channel, source, since timestamp). Results are ranked by relevance (key/summary match).
  • Knowledge is loaded and injected into the conversation at session start.
  • Supports pub/sub channels for inter-agent knowledge sharing:
    • Agents publish findings to named channels (e.g., "findings", "decisions")
    • The orchestrator routes knowledge between sibling agents via subscriptions
    • Cursor-based tracking ensures agents only see new entries
  • Can be disabled in intendant.toml:
[memory]
enabled = false  # default: true

JSON Output Mode

--json enables JSONL structured output to stdout (implies --no-tui). Each line is a JSON object with type and data fields. Event types include: turn_started, model_response, model_response_delta, agent_output, done, error, approval_required, human_question, budget_warning, round_complete, context_management.

In JSON mode, stdin accepts both plain text (follow-up messages) and JSON commands using the same ControlMsg format as the Unix control socket:

{"action":"approve","id":123}
{"action":"deny","id":123}
{"action":"skip","id":123}
{"action":"approve_all","id":123}
{"action":"input","text":"answer to askHuman"}
{"action":"follow_up","text":"continue with this"}

Lines not starting with { or not parseable as ControlMsg are treated as follow-up text. This makes --json mode fully interactive: approval flows, askHuman, and multi-round conversations all work without a TUI or control socket.

TUI & Autonomy

TUI

intendant includes a ratatui-based terminal UI that launches automatically when both stdin and stdout are terminals. The TUI provides real-time monitoring and control of the agent loop.

Layout

┌─────────────────────────────────────────────┐
│ StatusBar: provider │ model │ turn │ budget  │  1 line
├─────────────────────────────────────────────┤
│ ActionPanel: phase + spinner + key hints    │  2 lines
├─────────────────────────────────────────────┤
│                                             │
│ LogPanel: scrollable, color-coded entries   │  fills remaining
│                                             │
├─────────────────────────────────────────────┤
│ ApprovalPanel / InputPanel (conditional)    │  3-4 lines
└─────────────────────────────────────────────┘

Panels

  • Status bar: Provider, model, turn count, budget percentage, autonomy level
  • Action panel: Current phase with spinner — Thinking, RunningAgent, Orchestrating, WaitingApproval, WaitingHuman, WaitingFollowUp, Idle, Done
  • Log panel: Scrollable chronological log with color-coded levels (Info, Warning, Error, Debug)
  • Approval panel: Shown when an action needs user approval — command preview + category, y/s/a/n keys
  • Input panel: Shown when askHuman is triggered — tui-textarea for response
  • Follow-up panel: Shown when agent completes a round and awaits follow-up input
  • Help overlay: Key bindings reference (? key)
  • Inspect overlay: Detailed view of selected log entry

Key Bindings

KeyAction
q / Ctrl-CQuit
vToggle verbose mode (cycle through quiet/normal/verbose/debug)
?Help overlay
+ / -Cycle autonomy level
Up/Down/PgUp/PgDnScroll log
Home / EndJump to top/bottom of log
1-3Toggle panels (status, action, log)
y / EnterApprove pending action
sSkip pending action
aAuto-approve all remaining
nDeny and stop

Markdown Rendering

Model responses containing markdown are rendered with syntax highlighting in the log panel:

  • Headers (# through ####) in blue
  • Bold (**text**) with bright styling
  • Italic (*text*) in lavender
  • Inline code (`code`) in green
  • Fenced code blocks (```) in green
  • List items (- and * ) with yellow bullets
  • Horizontal rules (---) as dim lines

Streaming Display

When a model is generating a response, text deltas are forwarded to the TUI in real-time via AppEvent::ModelResponseDelta and accumulated in a streaming buffer. The buffer is cleared when the full response arrives. This gives immediate feedback during long model responses.

Theme

The TUI uses a Catppuccin Mocha-inspired color scheme with budget-aware color thresholds (green → yellow → red as context fills up).

Autonomy System

The autonomy system controls which actions require human approval. It operates on three layers:

Layer 1 — Global Level

Set via CLI --autonomy flag, toggleable in TUI with +/-:

LevelBehavior
LowAsk before every command execution
MediumAsk before writes, network, destructive (default)
HighOnly ask for unavoidable human input
FullNever ask (fully autonomous)

Layer 2 — Per-Category Rules

From intendant.toml [approval] section. Overrides the global level for specific action categories. Rules: auto (always approve), ask (require approval), deny (always deny).

Layer 3 — Per-Action Approval

When approval is needed, the agent loop pauses and the TUI shows the command preview. The user can approve, skip, deny, or switch to auto-approve mode.

Action Classification

Commands are classified into categories by inspecting the command JSON:

CategoryExamples
FileReadinspectPath, recallMemory
FileWriteeditFile, writeFile, storeMemory
FileDeleteCommands with rm, rmdir
CommandExecexecAsAgent, execPty
NetworkRequestCommands with curl, wget, ssh, git
DestructiveCommands with rm -rf, kill, dd, mkfs, sudo
HumanInputaskHuman

Shell commands are further classified by inspecting the command string for destructive patterns, network tools, and file writes (redirects, tee, mv, cp). The sudo prefix is detected as Destructive and the actual command after sudo is also classified.

Web Dashboard

The --web flag starts a web server that serves a modern 4-tab dashboard at / with Activity, Usage, Terminal, and Displays tabs. The Terminal tab provides the same ratatui interface as the native TUI via xterm.js, while the other tabs add event logging, cost tracking, and remote display viewing.

# Default port 8765
./target/release/intendant --web

# Custom port
./target/release/intendant --web 9000

The --web flag implies --mcp, so no initial task is required — the agent starts idle and accepts tasks dynamically. Open http://<host>:8765/ in a browser.

The dashboard also supports optional live voice interaction via Gemini Live or OpenAI Realtime, with active/passive multi-browser support and session continuity across reconnects.

See Web Dashboard for full documentation and Integrations — Web Gateway for the WebSocket protocol.

Web Dashboard

The --web flag starts a web server that serves a modern dashboard for monitoring and interacting with Intendant remotely. The dashboard runs entirely in the browser with WASM-powered state management.

Running

# Default port 8765
./target/release/intendant --web

# Custom port
./target/release/intendant --web 9000

Open http://<host>:8765/ in a browser. The --web flag implies --mcp, so no initial task is required — the agent starts idle and accepts tasks dynamically.

Dashboard Tabs

Activity

A scrollable, color-coded event log showing everything happening in the system:

  • system — session lifecycle, approvals, context management
  • worker — model responses, reasoning summaries, task completion
  • agent — command execution output (stdout/stderr, exit codes)
  • live — voice transcripts, presence lifecycle, tool requests
  • server — presence model internals (thinking, tool calls)

Events are grouped by turn with visual separators. New events while viewing other tabs trigger a notification badge. Late-connecting browsers receive a full replay of historical events from session.jsonl.

Usage

Token consumption for the main model and presence model:

  • Prompt, completion, and cached token breakdowns
  • Cost estimates using a built-in pricing table (OpenAI, Anthropic, Gemini models)
  • Usage history over time
  • Updated after each agent turn via usage_update events

Terminal

An embedded xterm.js terminal connected to the server-side ratatui TUI. Each browser connection gets its own independent terminal rendering with separate dimensions. This shows the same interface as the native terminal TUI — status bar, log panel, action panel, approval/input panels.

Key presses and terminal resizes in the browser are sent to the server and rendered independently per connection.

Displays

Remote viewing of Xvfb displays created by the agent. When the agent runs graphical applications (via execAsAgent with a DISPLAY), the display appears here as a noVNC viewer.

Displays are created lazily — the tab populates automatically when the agent’s first command triggers Xvfb auto-launch. Each display shows the VNC port for direct connection too.

Live Voice

The dashboard supports optional live voice interaction via Gemini Live or OpenAI Realtime. When activated:

  • The browser connects directly to the model’s realtime API for low-latency voice I/O
  • The live model receives agent events and narrates progress
  • Tool calls from the live model (submit_task, approve_action, check_status, etc.) are routed through the WebSocket to the server
  • Server-side presence is automatically paused (mutual exclusion)

Setup

  1. Enter your API key on first visit (Gemini or OpenAI)
  2. Keys are stored in browser localStorage — never sent to the Intendant server
  3. Click the microphone button to connect

Active/Passive Browsers

Only one browser can be “active” (controlling the voice model) at a time:

  • First browser to connect voice becomes active
  • Additional browsers are passive observers (receive events and TUI frames, but don’t pause server-side presence)
  • A passive browser can request active status via the UI, which force-disconnects the previous active browser
  • Active handover includes the last checkpoint summary and conversation context

Session Continuity

The presence session protocol maintains context across reconnects:

  1. On connect, the server sends a presence_welcome with current state, missed events, and conversation context
  2. The browser sends periodic presence_checkpoint messages with a summary of the conversation
  3. On reconnect, the server replays events since the last checkpoint
  4. This prevents the voice model from losing context when the browser refreshes or the connection drops

Server-Side Transcription

When [transcription] is enabled in intendant.toml, the browser sends microphone audio to the server for transcription via the Whisper API:

[transcription]
enabled = true
provider = "openai"
model = "whisper-1"
language = "en"

Audio is buffered in ~3s chunks, filtered by RMS energy to skip silence, and sent to the transcription endpoint. Transcripts are broadcast as user_transcript events and logged to the session.

Configuration

The web gateway configuration is controlled by [presence] settings in intendant.toml:

[presence]
live_provider = "gemini"                                    # voice model provider
live_model = "gemini-2.5-flash-native-audio-preview-12-2025"  # voice model

Or via environment variables:

  • GEMINI_API_KEY / OPENAI_API_KEY — for ephemeral token minting (POST /session)

The /config endpoint returns the configured provider, model, and sample rates as JSON.

HTTP Endpoints

EndpointDescription
GET /Web app dashboard (4-tab UI)
GET /configLive model configuration JSON
GET /debugDebug JSON (agent state, voice connection, active browser)
POST /sessionMint ephemeral session tokens for Gemini Live / OpenAI Realtime
GET /wasm-web/*WASM and JS glue (content-hash cache-busted)
GET /audio-processor.jsAudioWorklet processor for microphone capture
WS /Main WebSocket (events, terminal I/O, presence protocol)
WS /vncWebSocket-to-TCP VNC proxy (for noVNC)

Requirements

  • Microphone access requires a secure context: Use localhost (via SSH tunnel: ssh -L 8765:localhost:8765 host), or set browser flags for insecure origins
  • API key for voice: Gemini or OpenAI (stored browser-side only). Voice is optional — the dashboard works without it
  • WASM: The dashboard uses a compiled WASM module (presence-web crate). Rebuild with wasm-pack build --target web from crates/presence-web/ if you modify the Rust code

Multi-Agent Orchestration

Intendant supports multi-agent orchestration where a parent orchestrator decomposes complex tasks into sub-tasks and delegates them to specialized child agents. Each agent runs as a separate intendant process with its own context window, system prompt, and session log.

How It Works

User (TUI / MCP / Web)
    │
    ▼
[User Mode] — pure subprocess monitor, zero API calls
    │
    ▼
[Orchestrator Sub-Agent] — decomposes task, coordinates
    ├──▶ [Research Agent]       — investigation, file reading, browsing
    ├──▶ [Implementation Agent] — code writing, builds, tests (git worktree)
    └──▶ [Testing Agent]        — validation, test execution
    │
    ▼
Results merged, knowledge consolidated

When a complex task is submitted (and --direct is not set), intendant enters User Mode: it spawns an orchestrator sub-agent and monitors its progress without making any model API calls itself. The orchestrator then spawns specialized sub-agents as needed.

Agent Roles

Each sub-agent role has a dedicated system prompt that is appended to the base prompt:

RolePromptFocus
orchestratorSysPrompt_orchestrator.mdTask decomposition, sub-agent management, coordination, checkpointing
researchSysPrompt_research.mdInvestigation, file reading, browsing, synthesizing findings
implementationSysPrompt_implementation.mdCode writing, builds, testing, git worktree isolation
testingSysPrompt_testing.mdValidation, test execution, coverage

Sub-Agent Spawning

Sub-agents are spawned via tokio::process::Command with environment variables that configure their behavior:

VariablePurpose
INTENDANT_ROLEAgent role (triggers sub-agent mode)
INTENDANT_IDUnique identifier for this agent
INTENDANT_TASKTask description
INTENDANT_RESULT_FILEPath to write final results
INTENDANT_PROGRESS_FILEPath to write periodic progress
INTENDANT_PARENT_KNOWLEDGEPath to parent’s knowledge store
INTENDANT_INHERIT_MEMORY1 to inherit project memory

Progress and Results

Progress Polling

The parent agent polls each sub-agent’s progress file every 500ms. Progress is a JSON file with:

{
  "turn": 5,
  "status": "running",
  "last_action": "Running cargo test",
  "question": null
}

Progress updates are relayed to the TUI or stdout as OrchestratorProgress events.

Result Files

When a sub-agent completes, it writes a result JSON file:

{
  "id": "research-1",
  "status": "Completed",
  "summary": "Found 3 relevant API endpoints...",
  "findings": ["endpoint /api/users supports pagination", "..."],
  "artifacts": ["docs/api-analysis.md"],
  "usage": { "tokens_used": 15000, "context_window": 128000 }
}

The orchestrator reads result files to synthesize final outcomes and route knowledge between agents.

Git Worktree Isolation

Implementation agents can work in isolated git worktrees to avoid conflicts with the main working tree:

  • Create: worktree.rs creates a new worktree branch for the agent
  • Merge: On successful completion, the orchestrator merges the worktree branch back
  • Conflict handling: If merge conflicts arise, the orchestrator is prompted to resolve them
  • Cleanup: Worktrees are removed after merge or on failure

This allows multiple implementation agents to work on different parts of the codebase simultaneously without stepping on each other.

Knowledge Routing

The knowledge system supports inter-agent communication via pub/sub channels:

  • Publishing: Agents store findings with tagged channels (e.g., "findings", "decisions", "project_state")
  • Subscribing: The orchestrator sets up subscriptions between agents so they receive relevant knowledge
  • Cursor tracking: Each subscription tracks which entries have been consumed, ensuring agents only see new knowledge
  • Inheritance: Sub-agents can inherit the parent’s knowledge store via INTENDANT_INHERIT_MEMORY

Example Flow

  1. Research agent discovers database configuration → publishes to "findings" channel with tag "database"
  2. Orchestrator routes "findings" to implementation agent
  3. Implementation agent receives the database config via recallMemory with channel filter
  4. Implementation agent writes code using discovered config

Orchestrator Checkpointing

The orchestrator writes project state checkpoints after each sub-agent completes, using storeMemory with a project_state channel. Checkpoints capture:

  • Completed and active tasks
  • Architectural decisions made so far
  • Constraints and dependencies discovered

This preserves essential context across auto-compaction boundaries — when context is compacted at ~90% usage, the orchestrator can recover state via recallMemory.

Checkpoints are also written to disk as both project_state.json (machine-readable) and project_state.md (human-readable) in the sub-agent directory.

Configuration

Orchestration behavior can be tuned in intendant.toml:

[orchestrator]
max_parallel_agents = 4                    # max concurrent sub-agents
sub_agent_dir = ".intendant/subagents"     # workspace directory for sub-agents

To force single-agent mode and skip orchestration entirely, use the --direct flag.

Presence Layer

The presence layer is the conversational interface between the user and the agent system. It mediates all interaction: the user talks to presence, presence delegates work via submit_task, and narrates progress as events stream back from the agent loop.

Architecture

Only one presence model is active at a time — either server-side text presence OR browser-side live presence (Gemini Live / OpenAI Realtime). Never both simultaneously.

User input ──▶ [Presence Layer] ──▶ submit_task ──▶ Agent Loop
                     │                                  │
                     │◀── events (phase, approval, etc) ◀┘
                     │
                     ▼
              Narration to user (TUI / Web)

Server-Side Text Presence

The default mode. PresenceLayer wraps a small/fast text model (e.g., gemini-2.5-flash) and maintains its own Conversation separate from the agent’s.

Behavior

  • Processes user input via process_user_input() — decides whether to handle directly or delegate to the agent loop
  • Narrates agent events via handle_event() — translates phase changes, approvals, completions into conversational updates
  • Handles status queries, memory recall, and autonomy changes directly without involving the agent loop
  • Uses its own system prompt (SysPrompt_presence.md) — standalone, not appended to the base agent prompt
  • Follow-up input in the TUI is routed through the presence layer when active

Configuration

[presence]
enabled = true                # default: true
provider = "gemini"           # provider for the presence model (optional)
model = "gemini-2.5-flash"    # model for the presence layer (optional)
context_window = 32768        # context window for presence conversation (default: 32768)

Or via environment variables:

  • PRESENCE_PROVIDER — override provider (fallback: PROVIDER)
  • PRESENCE_MODEL — override model

Disable with --no-presence flag or [presence] enabled = false in intendant.toml.

Browser-Side Live Presence

When --web is used and a browser connects a live model (Gemini Live / OpenAI Realtime), the browser sends a presence_connect message over WebSocket. The server pauses PresenceLayer and sends a presence_welcome message with the current state, missed events, and conversation context. The browser’s live model takes over as the conversational front-end, using the same 9 tools via the WebSocket tool request/response protocol.

When the browser’s live model disconnects (page close, error), a presence_disconnect message is sent and server-side presence resumes automatically.

Configuration

[presence]
live_provider = "gemini"                                    # provider for browser-side live presence
live_model = "gemini-2.5-flash-native-audio-preview-12-2025"  # model for browser-side live presence

Voice requires an API key (Gemini or OpenAI), stored in browser localStorage. The key is used browser-side only — it is never sent to the Intendant server.

Active/Passive Multi-Browser

Only one browser connection can be “active” (controlling the voice model) at a time. Other connections are passive observers:

  • Active browser: Pauses server-side presence, receives tool responses, controls the voice session
  • Passive browsers: Receive TUI frames and events but don’t affect server-side presence
  • Handover: A passive browser can request active status via {"t":"make_active"}, which force-disconnects the previous active browser and sends an active_granted message with handover context

Session Continuity

The presence session protocol maintains voice context across reconnects:

  1. The server maintains a PresenceSession with an event window and checkpoint state
  2. Browsers send periodic presence_checkpoint messages with a conversation summary and last_event_seq
  3. On reconnect, the presence_welcome includes events since last_event_seq and the last checkpoint summary
  4. Conversation context from recent voice transcripts is also included for smooth resumption

Presence Tools

The presence layer has 9 tools, defined in the presence-core workspace crate:

Action Tools

ToolDescription
submit_taskSubmit a new task to the agent loop
approve_actionApprove a pending action
deny_actionDeny a pending action
skip_actionSkip a pending action
respond_to_questionAnswer an askHuman question
set_autonomyChange autonomy level

Action tools dispatch via the EventBus as ControlMsg — the same path as TUI key presses and control socket commands.

Query Tools

ToolDescription
check_statusRead current AgentStateSnapshot (phase, turn, budget, pending approval/question)
query_detailGet git diff, file contents, or log details from the project
recall_memorySearch the knowledge store by keywords, with optional channel/tag filters; falls back to session log

Query tools are handled synchronously server-side. They are shared between PresenceLayer and the web gateway via standalone functions in presence.rs.

Event Filtering

Not all agent events are worth narrating. The presence layer classifies events as:

Push-worthy (trigger narration):

  • TaskSubmitted, TaskComplete
  • ApprovalRequired, HumanQuestion
  • PhaseChanged (debounced to avoid rapid phase flip noise)
  • ContextManagement

Pull-only (available on request via check_status):

  • Status snapshots, log entries, token usage updates

Mutual Exclusion

The presence layer enforces mutual exclusion between server-side and browser-side presence:

  1. Browser connects live model → sends {"t":"presence_connect"}
  2. Web gateway emits AppEvent::PresenceConnected → pauses server-side presence
  3. Server sends {"t":"presence_welcome"} with state, event replay, and conversation context
  4. Server-side PresenceLayer::handle_event() returns Ok(None) while paused
  5. Browser live model handles all presence duties (narration, tool calls, user interaction)
  6. Browser disconnects → sends {"t":"presence_disconnect"}
  7. Web gateway emits AppEvent::PresenceDisconnected → resumes server-side presence

Legacy live_connected/live_disconnected messages are still accepted for backward compatibility.

presence-core Crate

The crates/presence-core/ workspace crate contains the WASM-compatible core logic:

  • Types: PresenceConfig, TaskEnvelope, PresenceEvent, AgentStateSnapshot, PresenceSession, PresenceCheckpoint, PresenceConnect, PresenceWelcome, constants
  • Dispatch: PresenceAction enum, dispatch_tool_call() — pure logic dispatch
  • Tools: 9 presence tool definitions (provider-agnostic ToolDefinition format)
  • Format: format_event(), truncate() (unicode-safe)
  • Prompt: DEFAULT_PRESENCE_PROMPT via include_str!
  • WASM: WasmPresence object, get_presence_tools(), get_presence_prompt() — browser-side presence logic

Minimal dependencies (serde + serde_json + wasm-bindgen, no tokio/reqwest). Compiles to both native and wasm32-unknown-unknown. The main crate re-exports its types and converts ToolDefinition to the provider-specific format.

presence-web Crate

The crates/presence-web/ crate provides the browser-side WASM layer:

  • app_state.rs — Pure-Rust app state for the web dashboard. All event routing, log filtering, usage tracking, and cost calculation. Methods return Vec<UiCommand> which the thin JS layer applies to the DOM. Includes a per-model pricing table covering OpenAI, Anthropic, and Gemini models.
  • app_web.rs — Browser-side app dashboard entry point. WASM↔DOM bridge, tab management, WebSocket event dispatch.
  • server.rs — WebSocket connection to the Intendant server, message routing.
  • gemini.rs — Gemini Live API integration (BidiGenerateContent), dual-mode auth (API key + ephemeral token).
  • openai.rs — OpenAI Realtime API integration.
  • callbacks.rs — JS callback management for voice/tool events.

Build: wasm-pack build --target web --out-dir ../../static/wasm-web --out-name presence_web from crates/presence-web/.

Tool Dispatch Flow

Tool dispatch uses presence_core::dispatch_tool_call() which returns a PresenceAction enum:

Tool call arrives (from text model or browser live model)
    │
    ▼
dispatch_tool_call() → PresenceAction
    │
    ├── TextResult(text) → return immediately
    ├── SubmitTask(envelope) → send to EventBus
    ├── Approve/Deny/Skip → send ControlMsg to EventBus
    ├── SetAutonomy(level) → send ControlMsg to EventBus
    └── NeedsIO(query) → platform layer handles:
         ├── check_status → read AgentStateSnapshot
         ├── query_detail → read files, git diff
         └── recall_memory → search knowledge store + session log

Pure-logic tools return TextResult/SubmitTask/Approve/etc. I/O-dependent tools return NeedsIO for the platform layer to handle, keeping presence-core free of I/O dependencies.

MCP Server

The --mcp flag launches Intendant as a Model Context Protocol server on stdio. This lets external AI agents (Claude Code, Codex, etc.) observe and control Intendant with full parity to the TUI — every action a human can take in the TUI is available as an MCP tool. The server also supports connecting to external MCP servers as a client (see MCP Client below).

Running

# Launch as MCP server (stdio transport)
./target/release/intendant --mcp "Deploy the application"

# With provider/model overrides
./target/release/intendant --mcp --provider anthropic --model claude-sonnet-4-5-20250929 "Fix the tests"

# With autonomy preset
./target/release/intendant --mcp --autonomy high "Refactor the auth module"

Client Configuration

Add Intendant to your MCP client’s config. For Claude Code (~/.claude/claude_desktop_config.json):

{
  "mcpServers": {
    "intendant": {
      "command": "intendant",
      "args": ["--mcp", "Your task description here"]
    }
  }
}

Tools

All tools mirror TUI actions. The server enforces compile-time parity — adding a new user action to the TUI requires implementing it in the MCP server (and vice versa).

ToolDescriptionParameters
get_statusCurrent status: provider, model, turn, budget, phase, autonomy, verbosity, tokens
get_logsLog entries with cursor-based pagination and level filteringsince_id?, level_filter?, limit?
get_pending_approvalCurrent pending approval request (or null)
get_pending_inputCurrent pending human question (or null)
approveApprove a pending command (TUI: y)id
denyDeny a pending command and stop (TUI: n)id
skipSkip a pending command, continue (TUI: s)id
approve_allApprove and set autonomy to Full (TUI: a)id
respondAnswer an askHuman question (TUI: type + Enter)text
set_autonomySet autonomy level (TUI: +/-)level: "low", "medium", "high", "full"
set_verbositySet log verbosity (TUI: v)level: "quiet", "normal", "verbose", "debug"
quitShut down the agent (TUI: q)
start_taskStart a new agent tasktask
schedule_controller_restartSchedule a controller restart/autonomous re-init workflowcontroller_id, north_star_goal, reason?, restart_after?, restart_command?, auto_start_task?, max_attempts?, cooldown_sec?
controller_turn_completeFinal handshake from controller; validates token and executes scheduled restartrestart_id, turn_complete_token, status?, handoff_summary?
get_restart_statusGet current controller restart state (or null)
cancel_controller_restartCancel scheduled restartrestart_id?
request_controller_loop_haltRequest loop haltpersistent?
clear_controller_loop_haltClear loop halt flags so restarts can proceed again
intervene_controller_loopRequest intervention for active loop processmode: "stop" or "abort"
get_controller_loop_statusUnified loop health snapshot
reloadRebuild binary and hot-reload the MCP server via exec()

schedule_controller_restart, controller_turn_complete, and cancel_controller_restart return JSON payloads with an ok boolean and status fields. Rejections are returned as JSON (ok: false) with an error message instead of plain text.

Hot Reload

The reload tool rebuilds the binary from source (cargo build --release) and replaces the running MCP server process in-place using exec(). The MCP connection survives seamlessly — no Claude Code restart needed.

How it works:

  1. reload runs cargo build --release in the project directory
  2. After sending the tool response, the process calls exec() to replace itself with the new binary
  3. The new process detects INTENDANT_MCP_RELOAD=1 and uses a ReloadTransport that injects a synthetic MCP initialization handshake
  4. Claude Code continues using the same connection — the stdio file descriptors survive exec()

This is particularly useful during development: edit code, call reload, and the MCP server picks up all changes without losing the connection.

Resources

Resources provide push-based state observation via subscriptions. The server sends notifications/resources/updated when state changes, so clients know to re-fetch.

URIDescription
intendant://statusProvider, model, turn count, budget %, phase, autonomy, session ID, task
intendant://usagePer-model token usage: tokens used, context window, usage % (main + optional presence)
intendant://logsLast 100 chronological log entries (same as TUI log panel)
intendant://pending-approvalCurrent pending approval request, if any
intendant://pending-inputCurrent pending human question, if any
intendant://controller-restartCurrent controller restart workflow state, if any
intendant://controller-loopLoop health snapshot (intervention flags, singleton lock owner, active wrapper/codex PIDs, latest run pointers)

Controller Restart Workflow

Use this when you want Intendant to trigger a controller re-init cycle safely.

  1. Call schedule_controller_restart and capture restart_id + turn_complete_token.
  2. Before ending the controlling agent turn, call controller_turn_complete with both values.
  3. Intendant executes restart actions:
    • spawn restart_command (if provided), and/or
    • start a fresh Intendant task using north_star_goal (auto_start_task=false by default; opt in for E2E testing).
  4. Inspect state via get_restart_status or intendant://controller-restart.

Notes

  • Restart state is persisted to the current session dir as controller_restart.json.
  • restart_after defaults to "turn_end".
  • restart_after accepts only "turn_end" or "now"; other values are rejected.
  • Restart workflow string inputs are normalized (trimmed) before validation/execution.
  • restart_command, when provided, must not be empty/whitespace.
  • At least one restart action is required at schedule time: set restart_command and/or auto_start_task=true.
  • max_attempts must be >= 1; 0 is rejected.
  • Optional status, handoff_summary, and cancel restart_id guard treat whitespace-only values as unset.
  • If restart_after="now" and execution fails after passing validation, schedule_controller_restart reports "ok": false and includes execution_error.
  • schedule_controller_restart always reports "phase" from persisted restart state; for restart_after="now" this reflects the post-execution phase ("completed" or "failed").
  • Any restart execution failure (including auto_start_task launch errors) updates persisted restart state to "phase": "failed" and populates last_error.
  • schedule_controller_restart rejection payloads use "status": "rejected" and include "error" (plus "restart_id"/"phase" when a conflicting active restart exists).
  • controller_turn_complete reports JSON results:
    • success: "status": "completed", "ok": true, plus "execution" and "phase".
    • rejection/pending: "ok": false, with "status" ("rejected" or "restart_pending") and "error".
  • controller_turn_complete only accepts restarts in "awaiting_turn_complete"; duplicate or late handshakes (for example "phase": "ready") are rejected to prevent duplicate restart execution.
  • cancel_controller_restart reports JSON results:
    • success: "status": "cancelled", "ok": true, plus "restart_id" and "phase": "cancelled".
    • rejection: "status": "rejected", "ok": false, with "error" (and optional "restart_id"/"phase" context).
  • request_controller_loop_halt, clear_controller_loop_halt, intervene_controller_loop, and get_controller_loop_status return/emit normalized loop health data (flags, lock owner PID/aliveness, latest run pointers, and active PID counts).
  • Control-socket command_result.data mirrors structured payloads for restart actions and loop-control actions.
  • get_restart_status and intendant://controller-restart redact turn_complete_token as "[redacted]"; only schedule_controller_restart returns the raw token for the final handshake call.

Controller Recursion Profile

Recommended for Codex/Claude-style controllers:

  • Set auto_start_task=false (or omit it, since false is the default).
  • Use restart_command to relaunch the external controller process.
  • Treat start_task as optional E2E testing only, not the default recursion path.

Controller Loop Monitoring

Controller loop monitoring files (for restart_command scripts):

  • Write run artifacts under .intendant/controller-loop/<run_id>/.
  • Maintain stable pointers:
    • .intendant/controller-loop/latest (symlink to current/latest run)
    • .intendant/controller-loop/latest.pid (wrapper script PID)
    • .intendant/controller-loop/latest.status.json (latest status snapshot)
    • .intendant/controller-loop/latest.jsonl (path to latest JSONL output file)
    • .intendant/controller-loop/active.lock/ (singleton lock: pid, run_id, acquired_at)
  • Recommended commands:
    • tail -f .intendant/controller-loop/latest/codex.jsonl
    • watch -n 2 'cat .intendant/controller-loop/latest/heartbeat.txt'
    • cat .intendant/controller-loop/latest.status.json
  • Intervention controls:
    • Halt future loop cycles (persistent): touch .intendant/controller-loop/request_halt
    • Halt future loop cycles (legacy marker, consumed once): touch .intendant/controller-loop/request_halt_after_cycle
    • Graceful stop current run: touch .intendant/controller-loop/request_stop
    • Immediate abort current run: touch .intendant/controller-loop/request_abort
    • Intervention history: cat .intendant/controller-loop/latest/intervention.log
  • Per-run PID files:
    • .intendant/controller-loop/<run_id>/wrapper.pid
    • .intendant/controller-loop/<run_id>/codex.pid

Typical Agent Workflow

  1. Call get_status to see the current phase and budget
  2. Poll get_logs with since_id to stream new events
  3. When an approval is needed, get_pending_approval returns the command preview — call approve, deny, or skip
  4. When askHuman triggers, get_pending_input returns the question — call respond with your answer
  5. Call quit when done

MCP Client

Intendant can also act as an MCP client, connecting to external MCP servers configured in intendant.toml. This lets agents use tools from external servers (filesystem, GitHub, databases, etc.) alongside Intendant’s native tools.

Configuration

[[mcp_servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]

[[mcp_servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]

[mcp_servers.env]
GITHUB_TOKEN = "ghp_..."

How It Works

At startup, McpClientManager connects to all configured servers via child process transport, discovers their tools, and registers them with the mcp__<server>_<tool> naming convention. For example, a filesystem server’s read_file tool becomes mcp__filesystem_read_file.

Tool calls with the mcp__ prefix are routed through the MCP client manager to the appropriate server. If a server fails to connect at startup, it is skipped with a warning — other servers and native tools continue to work.

Integrations

This chapter covers the control socket (Unix domain socket) and web gateway (WebSocket) integration points. For the MCP server interface, see MCP Server. For the presence layer that mediates user interaction, see Presence Layer.

Control Socket

When --control-socket is enabled, a Unix domain socket is created at /tmp/intendant-<pid>.sock. This enables programmatic control of a running Intendant instance from external scripts and tools.

  • Outbound event broadcast to all connected clients
  • Inbound command handling for status, approval, denial, human input, autonomy change, quit, controller-restart workflow commands, and controller-loop intervention commands (in MCP mode)
  • Socket server is opt-in via --control-socket

Inbound Commands (JSON-line)

{"action": "status"}
{"action": "approve", "id": 123}
{"action": "deny", "id": 123}
{"action": "input", "text": "answer to askHuman"}
{"action": "set_autonomy", "level": "high"}
{"action": "schedule_controller_restart", "controller_id":"codex", "north_star_goal":"audit and improve", "restart_after":"turn_end"}
{"action": "controller_turn_complete", "restart_id":"<id>", "turn_complete_token":"<token>", "status":"ok", "handoff_summary":"..."}
{"action": "get_restart_status"}
{"action": "cancel_controller_restart", "restart_id":"<id>"}
{"action": "request_controller_loop_halt", "persistent": true}
{"action": "clear_controller_loop_halt"}
{"action": "intervene_controller_loop", "mode":"stop"}
{"action": "get_controller_loop_status"}
{"action": "query_detail", "scope": "diff"}
{"action": "query_detail", "scope": "file", "target": "src/main.rs"}
{"action": "recall_memory", "keywords": ["auth", "login"], "channel": "project_state"}
{"action": "usage"}
{"action": "quit"}

Outbound Events (streamed to connected clients)

{"event": "turn_started", "turn": 5, "budget_pct": 12.3}
{"event": "agent_output", "stdout": "...", "stderr": "..."}
{"event": "approval_required", "id": 123, "command": "rm -rf /tmp/test"}
{"event": "ask_human", "question": "Which database?"}
{"event": "task_complete", "reason": "done signal"}
{"event": "status", "turn": 3, "phase": "thinking", "autonomy": "medium", "session_id": "abc-123", "task": "fix tests"}
{"event": "usage", "main": {"provider": "openai", "model": "gpt-5", "tokens_used": 12000, "context_window": 128000, "usage_pct": 9.4}}
{"event": "usage_update", "main": {"provider": "openai", "model": "gpt-5", "tokens_used": 15000, "context_window": 128000, "usage_pct": 11.7}}
{"event": "command_result", "action": "get_restart_status", "ok": true, "message": "ok", "data": {...}}
  • The status event now includes session_id and task fields.
  • The usage event is a response to {"action": "usage"}, returning per-model token usage.
  • The usage_update event is broadcast automatically after each agent turn, providing streaming token consumption updates. The presence field is included when the presence layer is active.

command_result.ok is false when a control action fails (for example, schedule_controller_restart with restart_after="now" and no executable restart action configured).

Example Usage

echo '{"action":"status"}' | socat - UNIX:/tmp/intendant-$(pgrep intendant).sock

Web Gateway

The --web flag starts a web server that serves the app dashboard and bridges WebSocket connections to the EventBus. --web implies --mcp, so no initial task is required — the agent starts idle and accepts tasks dynamically.

See Web Dashboard for the full dashboard documentation and Presence Layer for details on the presence session protocol and mutual exclusion.

How It Works

Browser ──WebSocket──> Intendant web gateway (port 8765)
  │                              │
  │  Terminal I/O (ANSI)         │  Events (broadcast to all clients)
  │  Key/resize input            │  Tool responses (per-connection direct channel)
  │  Tool requests               │  State snapshot + log replay (on connect)
  │  presence_connect/disconnect │  Presence welcome (on voice connect)
  │  Voice logs/checkpoints      │  Per-connection TUI frames
  │  Audio for transcription     │
  v                              v
App dashboard (WASM)        EventBus + AgentStateSnapshot
  +                              │
Optional: browser-side           │  Dual outbound channels:
live model (Gemini/OpenAI)       │  - broadcast::Receiver (events)
  │                              │  - mpsc::unbounded (direct responses)
  │  (function calls → tool_request)
  v
Intendant agent loop

The web gateway has three layers:

  1. App dashboard — The primary web interface at / with 4 tabs (Activity, Usage, Terminal, Displays). State management is handled by presence-web WASM. Events are broadcast and late-connecting browsers get a full log replay.

  2. Per-connection TUI rendering — Each WebSocket connection gets its own WebTui instance with independent terminal dimensions. ANSI output is sent per-connection via the direct channel, not broadcast.

  3. Presence bridge (optional) — When a browser connects a live model (Gemini Live / OpenAI Realtime), the model uses 9 presence tools that map to tool_request WebSocket messages. The gateway handles these server-side and returns tool_response messages on the per-connection direct channel.

WebSocket Protocol

Inbound Messages (browser → server)

MessageDescription
{"t":"key","key":"..."}Keyboard input (routed to per-connection WebTui)
{"t":"resize","cols":N,"rows":N}Terminal resize (per-connection)
{"t":"presence_connect",...}Presence session protocol — replaces server-side presence
{"t":"presence_disconnect"}Disconnect presence — resumes server-side presence
{"t":"make_active"}Request active voice ownership (handover)
{"t":"voice_log","text":"...","seq":N}Voice transcript from browser presence model
{"t":"presence_checkpoint","summary":"...","last_event_seq":N}Context checkpoint
{"t":"voice_diagnostic","kind":"...","detail":"..."}Browser voice diagnostics
{"t":"user_audio","data":"<base64>"}PCM16 audio for server-side transcription
{"t":"tool_request","id":"...","tool":"...","args":{}}Presence tool call
{"t":"async_query","id":"...","tool":"...","args":{}}Async query (result as text, not tool response)
{"action":"..."}ControlMsg (same as Unix control socket)
{"t":"live_connected"} / {"t":"live_disconnected"}Legacy (still accepted)

Outbound Messages (server → browser)

MessageDescription
{"t":"term","d":"<base64>"}Per-connection TUI ANSI output
{"t":"state_snapshot","state":{...},"connection_id":"...","config":{...},"session_id":"..."}Bootstrap on connect
{"t":"log_replay","entries":[...]}Historical session events for late-connecting browsers
{"t":"presence_welcome","session_id":"...","state":{...},"events":[...],"is_active":bool,"conversation_context":"..."}Presence session welcome
{"t":"active_granted","is_active":true,"handover_context":"...","conversation_context":"..."}Active ownership granted
{"t":"force_disconnect_voice","reason":"handover"}Sent to old active on handover
{"t":"presence_checkpoint_ack","seq":N}Checkpoint acknowledgement
{"t":"tool_response","id":"...","result":"..."}Response to a tool_request
{"t":"async_query_result","id":"...","tool":"...","result":"..."}Response to async_query
{"event":"..."}OutboundEvent broadcast (status, agent_output, approval_required, etc.)

Tool Request/Response Protocol

The browser live model calls presence tools via tagged request/response messages:

// Browser sends:
{"t":"tool_request","id":"req-42","tool":"check_status","args":{}}

// Server responds (on direct channel):
{"t":"tool_response","id":"req-42","result":"Phase: Running agent (turn 5). Budget: 23% used."}

Action tools (submit_task, approve_action, deny_action, skip_action, respond_to_question, set_autonomy) are dispatched via the EventBus — the same path as TUI key presses and control socket commands.

Query tools (check_status, query_detail, recall_memory) are handled asynchronously server-side via presence::handle_tool_query(), which reads from the shared AgentStateSnapshot, project files, and knowledge store.

State Bootstrap

On WebSocket connect, the server sends multiple bootstrap messages:

  1. state_snapshot — Full AgentStateSnapshot with connection_id, config, and session_id
  2. Cached usage_update — Latest token usage data
  3. Cached status — Latest status (autonomy, session_id, task)
  4. Cached display_ready — Latest display info for VNC slots
  5. log_replay — Historical session events parsed from session.jsonl

This ensures late-connecting browsers see the complete state immediately.

HTTP Endpoints

EndpointDescription
GET /App dashboard (4-tab UI: Activity, Usage, Terminal, Displays)
GET /configLive model configuration JSON
GET /debugDebug JSON (agent state, voice connection, active browser)
POST /sessionMint ephemeral session tokens for Gemini Live / OpenAI Realtime
GET /wasm-web/*WASM and JS glue (content-hash cache-busted)
GET /audio-processor.jsAudioWorklet processor for microphone capture
WS /Main WebSocket (events, terminal I/O, presence protocol)
WS /vncWebSocket-to-TCP VNC proxy for noVNC display viewing

Requirements

  • Microphone access requires a secure context: Use localhost (via SSH tunnel: ssh -L 8765:localhost:8765 host), or set browser flags for insecure origins.
  • API key for voice: Gemini or OpenAI. The key is used browser-side only. Voice is optional — the dashboard works without it.

Supported Tools (Browser Live Model)

ToolTypeDescription
submit_taskActionSubmit a new task to the agent loop
approve_actionActionApprove a pending action
deny_actionActionDeny a pending action
skip_actionActionSkip a pending action
respond_to_questionActionAnswer an askHuman question
set_autonomyActionChange autonomy level
check_statusQueryGet current agent phase, turn, budget
query_detailQueryGet git diff, file contents, or log details
recall_memoryQuerySearch the knowledge store by keywords/channel

Session Logging

Overview

Each intendant invocation creates a structured session log directory at ~/.intendant/logs/<uuid>/. The log provides full observability for debugging and post-session analysis. No global state files are used — each session is fully isolated.

Directory Structure

~/.intendant/logs/<uuid>/
├── session_meta.json               # Session metadata (id, created_at, project_root, task, status, last_turn)
├── session.jsonl                    # Structured event log (one JSON per line)
├── conversation.jsonl               # Serialized conversation for session resume
├── summary.json                     # Post-session summary (task, outcome, turns)
├── human_question                   # askHuman IPC: question file (session-scoped)
├── human_response                   # askHuman IPC: response file (session-scoped)
├── 1_stdout.log                     # Runtime stdout for nonce 1
├── 1_stderr.log                     # Runtime stderr for nonce 1
└── turns/
    ├── turn_001_messages.json       # Full messages array sent to API
    ├── turn_001_model.txt           # Full model response
    ├── turn_001_reasoning.txt       # Full reasoning content (if available)
    ├── turn_001_agent_in.json       # Commands sent to runtime (pretty-printed)
    ├── turn_001_stdout.txt          # Agent stdout for this turn
    └── turn_001_stderr.txt          # Agent stderr (only if non-empty)

Session Metadata

session_meta.json contains:

{
  "session_id": "a1b2c3d4-...",
  "created_at": "2025-01-15T10:30:00Z",
  "project_root": "/home/user/myproject",
  "task": "Fix the authentication bug",
  "role": null,
  "status": "running",
  "last_turn": 5
}

This file is used by --continue (find most recent session for the project) and --resume <id> (find session by ID or prefix).

Event Types in session.jsonl

EventDescription
session_startSession initialization
turn_startTurn boundary with budget % and remaining tokens
messages_inputFull API input logged (file reference to messages.json)
model_responseModel output with token counts (200-char preview, full in file)
reasoningReasoning summary and full content (if available from API)
json_extractedExtracted command JSON with function names
agent_inputCommands sent to runtime
agent_outputRuntime stdout/stderr
approvalApproval decisions (category, preview, decision)
context_managementAuto-compaction or manual context directive
session_endSummary with outcome and turn count

Querying Logs

# Overview of a session
cat ~/.intendant/logs/<session>/session.jsonl | jq -r '.event'

# See what the model received on turn 5
cat ~/.intendant/logs/<session>/turns/turn_005_messages.json | jq .

# See model reasoning on turn 3
cat ~/.intendant/logs/<session>/turns/turn_003_reasoning.txt

# Find all commands executed
grep '"event":"agent_input"' ~/.intendant/logs/<session>/session.jsonl | jq -r '.message'

# List all sessions
ls -lt ~/.intendant/logs/

# Find sessions for a specific project
grep -l '"project_root":"/home/user/myproject"' ~/.intendant/logs/*/session_meta.json

Session Resume

Conversation history is saved to conversation.jsonl after each turn, enabling session resume:

# Resume most recent session for this project
./target/release/intendant --continue "fix that bug"

# Resume specific session by ID or prefix
./target/release/intendant --resume abc123 "continue"

When resuming, the conversation is loaded from conversation.jsonl and the agent continues from where it left off. Session metadata is updated with the new task.

Test Coverage

The test suite covers both binaries with inline #[cfg(test)] modules:

  • Agent binary: models serialization, error types, process state operations, nonce replacement, path inspection, blocking command execution, file editing, browsing, port waiting, human interaction, PTY sessions, memory storage/recall with tags and filters.
  • Caller binary: JSON extraction, done signal handling, conversation management with message layer protection, tool call tracking, and auto-compaction, context directives (drop/summarize), error types, project detection, config parsing with approval rules and MCP server config and sandbox config, provider selection with token usage tracking, Responses API support, rate-limit retry with exponential backoff, API key masking, SSE streaming and event parsing, shared message builders, structured output and reasoning controls, role mapping, native tool definitions (11+ tools including MCP client tools, provider conversion formats), tool call batch assembly and result routing (including MCP tool routing), Gemini provider request/response format, sub-agent spawning and result parsing, git worktree lifecycle, user mode orchestration, knowledge pub/sub system, prompt resolution cascade (project root, global config, compiled-in defaults, tools-mode variant) with INTENDANT.md loading, TUI rendering (status bar, log panel, action panel, approval panel, help overlay, layout calculations, orchestrator progress, streaming buffer), autonomy level resolution and command classification, event bus dispatch, theme color thresholds, control socket serialization, session log file creation, model summary formatting, Xvfb display configuration per provider, dynamic display allocation, MCP client tool name parsing and routing, Landlock sandbox config construction, JSON structured output mode, web gateway (WebSocket lifecycle, tool request/response, broadcast, state bootstrap, live connect/disconnect), and presence event filtering.

Integration tests in tests/e2e/ spawn a real binary and exercise the full stack (see Architecture):

  • Tier 1 (JSON mode): Full-stack exec, approval approve/deny via stdin, multi-round follow-up. No display required.
  • Tier 2 (Control socket): Status/usage queries, autonomy change, approve via Unix control socket. Requires Xvfb.
  • Tier 3 (Web/Voice): WebSocket state_snapshot, tool_request/response, ANSI term frames, /debug endpoint. Voice tests require Firefox, PulseAudio, and espeak-ng.