GuideClaude computer useMCP serverAccessibility tree

Claude computer use, and the selector-based path the articles skip

Every explainer for this keyword says the same thing. Claude sees your screen, Claude controls your mouse, Claude is now an autonomous digital worker. None of them open the tool definition. The computer tool Anthropic ships is a pixel-coordinate loop: every turn, your harness sends a screenshot, Claude returns { action: "left_click", coordinate: [x, y] }, you execute, screenshot again. That is the product. This page is about the alternative that already exists: instead of giving Claude pixels, give it selectors. Terminator's MCP agent exposes 32 tools that talk to the OS accessibility tree, so Claude calls click_element("role:Button && name:Save") and nothing round-trips a screenshot through Anthropic.

T
Terminator
11 min read
4.9from Open-source, MIT
32 selector-based tools in one MCP server
dispatch_tool match block at server.rs line 9953
Accessibility tree driven: Windows UIA + macOS AX
One install command for Claude Code, Cursor, VS Code, Windsurf

The short version

Claude computer use is a tool Anthropic exposes via the API. The tool's contract is simple and that is the whole problem: the model takes a screenshot of your desktop as input and emits actions in pixel coordinates as output. The client loop is screenshot, send, receive coordinate, execute, screenshot, send, receive, execute. Every cycle is one image upload and one model call.

On Windows and macOS, the OS already publishes a live accessibility tree that knows where every button, edit field, menu item, and checkbox is, what role it has, what its name is, and whether it is enabled. Terminator's MCP agent wraps that tree into 32 MCP tools. Claude calls them by selector. The click resolves locally through Windows UIA or macOS AX. No screenshot is required for the vast majority of actions, and the model is not in the critical path for element lookup.

You can run both at once. The interesting question is which one Claude reaches for first. When Terminator is attached, it should be the tree, not the screenshot.

What Claude actually emits, in both worlds

This is the most concrete way to see the difference. Same user intent ("click the Save button"), two tool calls, two completely different sets of downstream work.

Pixel coordinate. Your harness ships a fresh screenshot, Claude reads it, Claude returns an (x, y) in pixel space. You execute the click with xdotool, PyAutoGUI, or your own driver. The model never saw the button's name or role, only its pixels.

  • Screenshot required every turn
  • Model in the inner loop of element lookup
  • Coordinates break when the window moves 12 pixels
  • Replaying the run means replaying the screenshots

The tool calls, side by side

Left: the JSON Claude emits under Anthropic's computer_20251022 tool schema. Right: the JSON Claude emits when Terminator's MCP agent is attached. Both target the same button. Only one requires a new screenshot to find it.

tool_use payloads for 'click Save'

// What Claude emits when native computer use is enabled.
// Source: Anthropic computer_20251022 tool schema.
// The model sees a screenshot, picks pixels, returns this JSON.

{
  "type": "tool_use",
  "name": "computer",
  "input": {
    "action": "left_click",
    "coordinate": [487, 341]
  }
}

// Your harness must:
//   1. Screenshot the desktop
//   2. Ship it to Anthropic with the tool definition
//   3. Receive an action with [x, y] in pixel space
//   4. Execute the click (xdotool, PyAutoGUI, your own driver)
//   5. Screenshot again, send again, wait again

// Every click is one image upload and one model call.
10% lines, and one fewer screenshot per click

The native pixel loop, drawn once

Eight actors, ten messages. The model is in the middle of every action. This is why long tasks cost real money and real time on Claude computer use: every arrow on the right-hand side is a paid inference plus an image token charge.

native computer use: one click

UserHarnessAnthropic APIDesktoptask: click Savetake screenshotdesktop.png (base64)messages + screenshot + tool defsmodel reads pixelstool_use: left_click [487, 341]move_mouse + click at (487, 341)os reports oktake screenshot againdesktop.png v2

The selector path, drawn as a beam

The MCP agent sits between Claude and the OS. Every tool call flows through a single dispatch_tool function. Selector in, accessibility-tree match out, action performed through UIA or AX. The model is not re-invoked to resolve the element.

Claude -> Terminator MCP -> OS accessibility tree

Claude Code
Cursor
VS Code
Windsurf
dispatch_tool
Windows UIA
macOS AX
Chrome DOM
Workflow engine

What the numbers look like

Three of these come straight from the repo. The fourth comes from Terminator's README claim, which is worth verifying yourself (it is the pitch of the project). All of them are checkable: the match arm count can be counted by reading server.rs, the line number is literally the line of the dispatch, and the concurrency default is in README line 45.

0Selector-based tools exposed to Claude
0Line of dispatch_tool in server.rs
0Total lines in server.rs
0xTarget speedup over pixel-loop agents
32 tools / 1 MCP call per click

Claude can drive the OS without a screenshot in the inner loop. The selector resolves against the live UIA tree, in-process, at CPU speed.

Terminator MCP agent, crates/terminator-mcp-agent/src/prompt.rs

dispatch_tool: the 32 tools Claude sees

This is the anchor fact for the page. Open crates/terminator-mcp-agent/src/server.rs at line 9953. There is one match tool_name block, each arm wires a tool name to an async Rust handler, and there are 32 named arms before the wildcard. Below is the shape of it, abbreviated so the names line up. Every one of these is what Claude can call when Terminator is attached.

crates/terminator-mcp-agent/src/server.rs

The rule the model is told, every session

Terminator's system prompt lives in src/prompt.rs. It starts by importing the compile-time tool list via env!("MCP_TOOLS") (populated by build.rs:31) and then it explicitly forbids the model from inventing selectors. This is the single most important line in the prompt, and the one that keeps Terminator from drifting back into a vision-style guess-and-check loop.

crates/terminator-mcp-agent/src/prompt.rs

Four everyday tasks, both ways

The easiest way to understand the latency delta is to think about what happens step-by-step for tasks a normal agent flow actually does.

FeatureNative computer use (pixel)Terminator MCP (selector)
Opens an applicationClaude must take a screenshot, find the taskbar icon pixel, click it, wait, screenshot againopen_application({ path: 'notepad' }) - single MCP call, returns the PID and the fresh UI tree
Fills a login formScreenshot. Coord-click the email field. Type. Screenshot. Coord-click password. Type. Screenshot.type_into_element({ selector: 'role:Edit && name:Email' }) twice - no vision loop
Reads a dialogScreenshot, OCR inside the model, hope the text survived compressionget_window_tree returns the literal Name and Value strings from the accessibility API
Runs a multi-step workflowLoop: screenshot -> LLM -> action -> screenshot -> LLM -> action ... Anthropic billed per turn.execute_sequence ships a whole YAML of steps in one call. Engine-mode JS/Python share state via env.

The architectural contrast

Same LLM behind both. Different assumptions about where element lookup happens and what the model is expected to do with its tokens.

FeatureAnthropic computer toolTerminator MCP
What the model returns per action{ action: "left_click", coordinate: [x, y] }{ name: "click_element", selector: "role:Button && name:Save" }
Input the model needs to seePNG screenshot of the desktop, every turnAccessibility tree (YAML/JSON), fetched once per screen
Round-trip cost per clickOne screenshot upload + one model callOne MCP stdio call. Model already knows the tree.
Where the resolution happensInside the model: it reads pixels, does OCR-style vision, returns coordsInside Terminator: selector is matched against the Windows UIA / macOS AX tree locally
Failure mode on UI driftButton moved 12 pixels, old coordinate misses, silent miss-clickSelector by role+name still resolves if the element is still there. If not, a typed McpError comes back.
ObservabilityScreenshots are the only artifact. Replay is imprecise.Every call logged by tool_logging.rs. UI tree before/after captured by default into executions/.
Cursor and keyboardTakes over your cursor. You cannot use the computer while it runs.Runs through accessibility APIs. Your cursor is untouched.
Session stateWhatever your harness script keepsLong-lived MCP process. Cancellation tokens, concurrency gate (MCP_MAX_CONCURRENT), focus restore.

Install in Claude Code in one command

The MCP agent ships as a single npm package. Claude Code exposes an mcp add helper that wires it up with the right stdio plumbing.

terminal

What actually happens when Claude clicks Save

Trace one click through the whole stack. This is the selector path, step by step, with the files you can open yourself.

1

Claude emits a tool_use for click_element

The name field is "click_element". The input is { selector: "role:Button && name:Save" }. No coordinates. No screenshot attached.

2

MCP host forwards JSON-RPC tools/call over stdio

Claude Code, Cursor, or whatever MCP client you use, frames the call as JSON-RPC 2.0 on the stdout pipe of the npx-spawned terminator-mcp-agent process.

3

dispatch_tool matches the name at server.rs:9953

The "click_element" arm deserialises the arguments into ClickElementArgs and awaits self.click_element(..) under a tokio::select against the request's cancellation token.

4

The selector resolves against the UIA or AX tree

On Windows, terminator-rs calls into IUIAutomation. On macOS, AXUIElement. It walks children by role and name until it finds the match, respecting tree_max_depth (default 30).

5

The action fires through the accessibility API

invoke() is preferred over click() because it does not require the element to be in the viewport or to have stable bounds. The OS performs the native click event.

6

Terminator captures the UI diff

By default, the before/after tree is diffed and a screenshot is saved to executions/. ui_diff_before_after returns what changed, has_ui_changes returns a boolean Claude can check.

7

A CallToolResult returns up the stdio pipe

Structured result, captured stderr, timing. Claude sees the diff next turn. No new screenshot was needed to find the button.

Give Claude the selectors, keep computer use for the rest

Terminator is MIT-licensed. Install the MCP agent in Claude Code in one command. Keep Anthropic's computer tool for the fully alien UIs (games, canvas apps) where the accessibility tree is empty. Let the tree do the work everywhere else.

Read the source on GitHub

Questions readers actually ask

How does Claude's native computer use actually work?

Anthropic ships a tool type called computer (current revision computer_20251022). When enabled, your harness is responsible for taking a screenshot of the desktop and sending it to Claude alongside the tool definition. Claude returns a tool_use block whose input is an action like left_click with a coordinate pair in pixel space. Your code executes the click, takes another screenshot, sends both back, and the loop continues. Every action is one screenshot upload and one model round-trip. This is not a limitation of Claude, it is how the tool is defined: the model sees pixels, you execute pixels.

Why is that loop expensive?

Two reasons, both mechanical. First, every action pays for one image token budget plus output tokens, and screenshots are not tiny even after downscaling. Anthropic's own computer use docs note that long tasks with frequent screenshots consume significant credits. Second, wall-clock latency: every action is one full inference pass, typically one to several seconds. A 40-action workflow becomes a coffee break. A deterministic selector-driven agent can do the same 40 steps in seconds because the model is not in the inner loop.

What exactly does Terminator's MCP server give Claude instead?

32 typed tools that speak accessibility-tree selectors, not coordinates. You can list them by opening crates/terminator-mcp-agent/src/server.rs at line 9953, where the dispatch_tool match block has one arm per tool. Examples: click_element takes a selector like role:Button && name:Save, type_into_element takes a selector plus text, navigate_browser drives the address bar, execute_browser_script runs JS inside the page, execute_sequence accepts a YAML of steps. The complete list at the time of writing: get_window_tree, get_applications_and_windows_list, click_element, type_into_element, press_key, press_key_global, validate_element, wait_for_element, activate_element, navigate_browser, execute_browser_script, open_application, scroll_element, mouse_drag, highlight_element, select_option, set_selected, capture_screenshot, invoke_element, set_value, execute_sequence, run_command, delay, stop_highlighting, stop_execution, gemini_computer_use, read_file, write_file, edit_file, copy_content, glob_files, grep_files.

Why selectors instead of coordinates?

Because the operating system already knows where every element is. Windows UI Automation and macOS Accessibility both expose a live tree where each element has a role (Button, Edit, Text, Window), a name, a value, bounds, and a parent chain. Terminator finds elements by matching that tree. role:Button && name:Save survives the button moving by 100 pixels, the window being resized, a theme change, or a DPI shift. A coordinate pair does not. The system prompt Terminator sends to Claude, in crates/terminator-mcp-agent/src/prompt.rs line 21, makes this explicit: always derive selectors strictly from the provided UI tree or DOM data; never guess or predict element attributes based on assumptions.

Can I use Claude's native computer use and Terminator together?

Yes. Terminator's MCP agent exposes both a capture_screenshot tool and a vision-model fallback called gemini_computer_use (server.rs line 10267) for cases where the accessibility tree is empty or lies. The normal flow is: Claude calls get_window_tree first, reads the tree, picks the selector it needs, calls click_element or type_into_element. If an element is truly invisible to accessibility, you can escalate to a visual path. The point is that the selector path is the default, not the exception, and most Windows and macOS apps expose enough tree to skip vision entirely.

How do I install Terminator's MCP server in Claude Code?

One command: claude mcp add terminator "npx -y terminator-mcp-agent@latest" -s user. That registers the server at the user scope, runs it over stdio under Claude Code's supervision, and exposes the 32 tools inside the normal tool picker. For Cursor, VS Code, Windsurf, or any other MCP client, the same server binary works with a standard mcpServers JSON block pointing at the same npx command. Full instructions live in crates/terminator-mcp-agent/README.md on GitHub.

Does Terminator replace Claude computer use or complement it?

It replaces the default path for the actions where accessibility tree is faster and more reliable, which is most of them on Windows and macOS. Terminator's README is explicit about the goal: run 100x faster than the pixel-loop agents and hit above 95% success rate by keeping the model out of the inner loop. Anthropic's native computer use still matters for fully alien UIs (games, some Electron apps that do not expose their tree, canvas-heavy apps), which is why Terminator keeps a vision-model tool available as a fallback rather than pretending it is never needed.

Is the system prompt really compiled into the binary?

Yes, and this is the detail worth looking at yourself. crates/terminator-mcp-agent/build.rs at line 31 defines extract_mcp_tools(), a build-time function that opens src/server.rs, scans for the let result = match tool_name line, and collects every subsequent "tool_name" =>. That list becomes the MCP_TOOLS environment variable via println!("cargo:rustc-env=MCP_TOOLS=..."). Then prompt.rs reads env!("MCP_TOOLS") at compile time and pastes it into the system instructions Claude receives. The practical consequence: the server cannot tell Claude about a tool that dispatch_tool does not handle, and Claude cannot see a stale tool list. They are the same list by construction.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.