Claude computer use, and the selector-based path the articles skip
Every explainer for this keyword says the same thing. Claude sees your screen, Claude controls your mouse, Claude is now an autonomous digital worker. None of them open the tool definition. The computer tool Anthropic ships is a pixel-coordinate loop: every turn, your harness sends a screenshot, Claude returns { action: "left_click", coordinate: [x, y] }, you execute, screenshot again. That is the product. This page is about the alternative that already exists: instead of giving Claude pixels, give it selectors. Terminator's MCP agent exposes 32 tools that talk to the OS accessibility tree, so Claude calls click_element("role:Button && name:Save") and nothing round-trips a screenshot through Anthropic.
The short version
Claude computer use is a tool Anthropic exposes via the API. The tool's contract is simple and that is the whole problem: the model takes a screenshot of your desktop as input and emits actions in pixel coordinates as output. The client loop is screenshot, send, receive coordinate, execute, screenshot, send, receive, execute. Every cycle is one image upload and one model call.
On Windows and macOS, the OS already publishes a live accessibility tree that knows where every button, edit field, menu item, and checkbox is, what role it has, what its name is, and whether it is enabled. Terminator's MCP agent wraps that tree into 32 MCP tools. Claude calls them by selector. The click resolves locally through Windows UIA or macOS AX. No screenshot is required for the vast majority of actions, and the model is not in the critical path for element lookup.
You can run both at once. The interesting question is which one Claude reaches for first. When Terminator is attached, it should be the tree, not the screenshot.
What Claude actually emits, in both worlds
This is the most concrete way to see the difference. Same user intent ("click the Save button"), two tool calls, two completely different sets of downstream work.
Pixel coordinate. Your harness ships a fresh screenshot, Claude reads it, Claude returns an (x, y) in pixel space. You execute the click with xdotool, PyAutoGUI, or your own driver. The model never saw the button's name or role, only its pixels.
- Screenshot required every turn
- Model in the inner loop of element lookup
- Coordinates break when the window moves 12 pixels
- Replaying the run means replaying the screenshots
The tool calls, side by side
Left: the JSON Claude emits under Anthropic's computer_20251022 tool schema. Right: the JSON Claude emits when Terminator's MCP agent is attached. Both target the same button. Only one requires a new screenshot to find it.
tool_use payloads for 'click Save'
// What Claude emits when native computer use is enabled.
// Source: Anthropic computer_20251022 tool schema.
// The model sees a screenshot, picks pixels, returns this JSON.
{
"type": "tool_use",
"name": "computer",
"input": {
"action": "left_click",
"coordinate": [487, 341]
}
}
// Your harness must:
// 1. Screenshot the desktop
// 2. Ship it to Anthropic with the tool definition
// 3. Receive an action with [x, y] in pixel space
// 4. Execute the click (xdotool, PyAutoGUI, your own driver)
// 5. Screenshot again, send again, wait again
// Every click is one image upload and one model call.The native pixel loop, drawn once
Eight actors, ten messages. The model is in the middle of every action. This is why long tasks cost real money and real time on Claude computer use: every arrow on the right-hand side is a paid inference plus an image token charge.
native computer use: one click
The selector path, drawn as a beam
The MCP agent sits between Claude and the OS. Every tool call flows through a single dispatch_tool function. Selector in, accessibility-tree match out, action performed through UIA or AX. The model is not re-invoked to resolve the element.
Claude -> Terminator MCP -> OS accessibility tree
What the numbers look like
Three of these come straight from the repo. The fourth comes from Terminator's README claim, which is worth verifying yourself (it is the pitch of the project). All of them are checkable: the match arm count can be counted by reading server.rs, the line number is literally the line of the dispatch, and the concurrency default is in README line 45.
“Claude can drive the OS without a screenshot in the inner loop. The selector resolves against the live UIA tree, in-process, at CPU speed.”
Terminator MCP agent, crates/terminator-mcp-agent/src/prompt.rs
dispatch_tool: the 32 tools Claude sees
This is the anchor fact for the page. Open crates/terminator-mcp-agent/src/server.rs at line 9953. There is one match tool_name block, each arm wires a tool name to an async Rust handler, and there are 32 named arms before the wildcard. Below is the shape of it, abbreviated so the names line up. Every one of these is what Claude can call when Terminator is attached.
The rule the model is told, every session
Terminator's system prompt lives in src/prompt.rs. It starts by importing the compile-time tool list via env!("MCP_TOOLS") (populated by build.rs:31) and then it explicitly forbids the model from inventing selectors. This is the single most important line in the prompt, and the one that keeps Terminator from drifting back into a vision-style guess-and-check loop.
Four everyday tasks, both ways
The easiest way to understand the latency delta is to think about what happens step-by-step for tasks a normal agent flow actually does.
| Feature | Native computer use (pixel) | Terminator MCP (selector) |
|---|---|---|
| Opens an application | Claude must take a screenshot, find the taskbar icon pixel, click it, wait, screenshot again | open_application({ path: 'notepad' }) - single MCP call, returns the PID and the fresh UI tree |
| Fills a login form | Screenshot. Coord-click the email field. Type. Screenshot. Coord-click password. Type. Screenshot. | type_into_element({ selector: 'role:Edit && name:Email' }) twice - no vision loop |
| Reads a dialog | Screenshot, OCR inside the model, hope the text survived compression | get_window_tree returns the literal Name and Value strings from the accessibility API |
| Runs a multi-step workflow | Loop: screenshot -> LLM -> action -> screenshot -> LLM -> action ... Anthropic billed per turn. | execute_sequence ships a whole YAML of steps in one call. Engine-mode JS/Python share state via env. |
The architectural contrast
Same LLM behind both. Different assumptions about where element lookup happens and what the model is expected to do with its tokens.
| Feature | Anthropic computer tool | Terminator MCP |
|---|---|---|
| What the model returns per action | { action: "left_click", coordinate: [x, y] } | { name: "click_element", selector: "role:Button && name:Save" } |
| Input the model needs to see | PNG screenshot of the desktop, every turn | Accessibility tree (YAML/JSON), fetched once per screen |
| Round-trip cost per click | One screenshot upload + one model call | One MCP stdio call. Model already knows the tree. |
| Where the resolution happens | Inside the model: it reads pixels, does OCR-style vision, returns coords | Inside Terminator: selector is matched against the Windows UIA / macOS AX tree locally |
| Failure mode on UI drift | Button moved 12 pixels, old coordinate misses, silent miss-click | Selector by role+name still resolves if the element is still there. If not, a typed McpError comes back. |
| Observability | Screenshots are the only artifact. Replay is imprecise. | Every call logged by tool_logging.rs. UI tree before/after captured by default into executions/. |
| Cursor and keyboard | Takes over your cursor. You cannot use the computer while it runs. | Runs through accessibility APIs. Your cursor is untouched. |
| Session state | Whatever your harness script keeps | Long-lived MCP process. Cancellation tokens, concurrency gate (MCP_MAX_CONCURRENT), focus restore. |
Install in Claude Code in one command
The MCP agent ships as a single npm package. Claude Code exposes an mcp add helper that wires it up with the right stdio plumbing.
What actually happens when Claude clicks Save
Trace one click through the whole stack. This is the selector path, step by step, with the files you can open yourself.
Claude emits a tool_use for click_element
The name field is "click_element". The input is { selector: "role:Button && name:Save" }. No coordinates. No screenshot attached.
MCP host forwards JSON-RPC tools/call over stdio
Claude Code, Cursor, or whatever MCP client you use, frames the call as JSON-RPC 2.0 on the stdout pipe of the npx-spawned terminator-mcp-agent process.
dispatch_tool matches the name at server.rs:9953
The "click_element" arm deserialises the arguments into ClickElementArgs and awaits self.click_element(..) under a tokio::select against the request's cancellation token.
The selector resolves against the UIA or AX tree
On Windows, terminator-rs calls into IUIAutomation. On macOS, AXUIElement. It walks children by role and name until it finds the match, respecting tree_max_depth (default 30).
The action fires through the accessibility API
invoke() is preferred over click() because it does not require the element to be in the viewport or to have stable bounds. The OS performs the native click event.
Terminator captures the UI diff
By default, the before/after tree is diffed and a screenshot is saved to executions/. ui_diff_before_after returns what changed, has_ui_changes returns a boolean Claude can check.
A CallToolResult returns up the stdio pipe
Structured result, captured stderr, timing. Claude sees the diff next turn. No new screenshot was needed to find the button.
Give Claude the selectors, keep computer use for the rest
Terminator is MIT-licensed. Install the MCP agent in Claude Code in one command. Keep Anthropic's computer tool for the fully alien UIs (games, canvas apps) where the accessibility tree is empty. Let the tree do the work everywhere else.
Read the source on GitHub →Questions readers actually ask
How does Claude's native computer use actually work?
Anthropic ships a tool type called computer (current revision computer_20251022). When enabled, your harness is responsible for taking a screenshot of the desktop and sending it to Claude alongside the tool definition. Claude returns a tool_use block whose input is an action like left_click with a coordinate pair in pixel space. Your code executes the click, takes another screenshot, sends both back, and the loop continues. Every action is one screenshot upload and one model round-trip. This is not a limitation of Claude, it is how the tool is defined: the model sees pixels, you execute pixels.
Why is that loop expensive?
Two reasons, both mechanical. First, every action pays for one image token budget plus output tokens, and screenshots are not tiny even after downscaling. Anthropic's own computer use docs note that long tasks with frequent screenshots consume significant credits. Second, wall-clock latency: every action is one full inference pass, typically one to several seconds. A 40-action workflow becomes a coffee break. A deterministic selector-driven agent can do the same 40 steps in seconds because the model is not in the inner loop.
What exactly does Terminator's MCP server give Claude instead?
32 typed tools that speak accessibility-tree selectors, not coordinates. You can list them by opening crates/terminator-mcp-agent/src/server.rs at line 9953, where the dispatch_tool match block has one arm per tool. Examples: click_element takes a selector like role:Button && name:Save, type_into_element takes a selector plus text, navigate_browser drives the address bar, execute_browser_script runs JS inside the page, execute_sequence accepts a YAML of steps. The complete list at the time of writing: get_window_tree, get_applications_and_windows_list, click_element, type_into_element, press_key, press_key_global, validate_element, wait_for_element, activate_element, navigate_browser, execute_browser_script, open_application, scroll_element, mouse_drag, highlight_element, select_option, set_selected, capture_screenshot, invoke_element, set_value, execute_sequence, run_command, delay, stop_highlighting, stop_execution, gemini_computer_use, read_file, write_file, edit_file, copy_content, glob_files, grep_files.
Why selectors instead of coordinates?
Because the operating system already knows where every element is. Windows UI Automation and macOS Accessibility both expose a live tree where each element has a role (Button, Edit, Text, Window), a name, a value, bounds, and a parent chain. Terminator finds elements by matching that tree. role:Button && name:Save survives the button moving by 100 pixels, the window being resized, a theme change, or a DPI shift. A coordinate pair does not. The system prompt Terminator sends to Claude, in crates/terminator-mcp-agent/src/prompt.rs line 21, makes this explicit: always derive selectors strictly from the provided UI tree or DOM data; never guess or predict element attributes based on assumptions.
Can I use Claude's native computer use and Terminator together?
Yes. Terminator's MCP agent exposes both a capture_screenshot tool and a vision-model fallback called gemini_computer_use (server.rs line 10267) for cases where the accessibility tree is empty or lies. The normal flow is: Claude calls get_window_tree first, reads the tree, picks the selector it needs, calls click_element or type_into_element. If an element is truly invisible to accessibility, you can escalate to a visual path. The point is that the selector path is the default, not the exception, and most Windows and macOS apps expose enough tree to skip vision entirely.
How do I install Terminator's MCP server in Claude Code?
One command: claude mcp add terminator "npx -y terminator-mcp-agent@latest" -s user. That registers the server at the user scope, runs it over stdio under Claude Code's supervision, and exposes the 32 tools inside the normal tool picker. For Cursor, VS Code, Windsurf, or any other MCP client, the same server binary works with a standard mcpServers JSON block pointing at the same npx command. Full instructions live in crates/terminator-mcp-agent/README.md on GitHub.
Does Terminator replace Claude computer use or complement it?
It replaces the default path for the actions where accessibility tree is faster and more reliable, which is most of them on Windows and macOS. Terminator's README is explicit about the goal: run 100x faster than the pixel-loop agents and hit above 95% success rate by keeping the model out of the inner loop. Anthropic's native computer use still matters for fully alien UIs (games, some Electron apps that do not expose their tree, canvas-heavy apps), which is why Terminator keeps a vision-model tool available as a fallback rather than pretending it is never needed.
Is the system prompt really compiled into the binary?
Yes, and this is the detail worth looking at yourself. crates/terminator-mcp-agent/build.rs at line 31 defines extract_mcp_tools(), a build-time function that opens src/server.rs, scans for the let result = match tool_name line, and collects every subsequent "tool_name" =>. That list becomes the MCP_TOOLS environment variable via println!("cargo:rustc-env=MCP_TOOLS=..."). Then prompt.rs reads env!("MCP_TOOLS") at compile time and pastes it into the system instructions Claude receives. The practical consequence: the server cannot tell Claude about a tool that dispatch_tool does not handle, and Claude cannot see a stale tool list. They are the same list by construction.
More from the Terminator guides
Keep reading
What is an MCP server? A real one, opened in the editor
The dispatch_tool match block in server.rs, the build.rs trick that keeps the system prompt in sync with the code. Terminator is the example.
Terminator on GitHub
Core Rust crates, MCP agent, Node and Python bindings, workflow recorder. MIT licensed.
Terminator MCP agent README
Install commands for Cursor, VS Code, Claude Code. HTTP transport, concurrency gate, virtual display support for headless VMs.