Nyx · computer-use

terminal‑use_

The TUI analog of browser-use. An MCP server that lets an AI agent drive any interactive terminal program — snapshot the screen, find what's clickable, answer a blocking prompt — the same way browser-use drives a web page.

A terminal is a deterministic text grid, not a screenshot to OCR. So grounding is exact: the agent knows the character and element at every (row, col). These clips show the real engine operating on captured terminal output — parse, derive, encode — no pixels, no guessing.

01Snapshot & element detection

A TUI (think k9s / lazygit) is on screen. terminal-use parses the VT byte grid, then derives the indexed interactive elements — buttons, list rows, the focused row — each grounded by (row, col). Then it encodes an SGR-1006 mouse click on element [0].

terminal_snapshotderiveElementsSGR-1006 click

02Blocked-on-prompt detection

The #1 thing that stalls agents: a program stops and waits for a human. terminal-use detects the block, classifies it (confirm / password / scaffolder), scores a confidence, and answers it — while leaving normal still-streaming output alone.

[y/N]Password:npm initconfidence score

03Action space

The same action vocabulary as browser-use — type, press-key, scroll, click — encoded to the exact terminal byte sequences. Enter, Tab, arrows, Ctrl-C, Escape, and mouse clicks, deterministically.

press_keyterminal_clickterminal_type

These clips exercise the real terminal-use-mcp functions (grid → deriveElements → deriveBlockedOnPrompt → encode) on captured terminal output. In the live Nyx runtime the same pipeline is driven by a real pseudo-terminal (node-pty), so workers can operate any TUI, answer any blocking prompt, and drive remote programs over SSH.