GuideClaude computer useMCP serverAccessibility tree

Claude computer use, and the selector-based path the articles skip

Every explainer for this keyword says the same thing. Claude sees your screen, Claude controls your mouse, Claude is now an autonomous digital worker. None of them open the tool definition. The computer tool Anthropic ships is a pixel-coordinate loop: every turn, your harness sends a screenshot, Claude returns { action: "left_click", coordinate: [x, y] }, you execute, screenshot again. That is the product. This page is about the alternative that already exists: instead of giving Claude pixels, give it selectors. Terminator's MCP agent exposes 32 tools that talk to the OS accessibility tree, so Claude calls click_element("role:Button && name:Save") and nothing round-trips a screenshot through Anthropic.

Matthew Diakonov, Written with AI

Published April 18, 202611 min read

4.9from Open-source, MIT

32 selector-based tools in one MCP server

dispatch_tool match block at server.rs line 9953

Accessibility tree driven: Windows UIA + macOS AX

One install command for Claude Code, Cursor, VS Code, Windsurf

Install in Claude Code See the 32 tools

The 30-second answer

Claude computer use lets Claude operate a computer by looking at screenshots and replying with mouse and keyboard actions in pixel coordinates. Your harness screenshots the desktop, sends the image to the model, gets back something like { action: "left_click", coordinate: [x, y] }, executes the click, then screenshots again. Every action is one image upload and one model round-trip, which is why long tasks get slow and expensive. On Windows and macOS you can skip that loop for most clicks: drive apps through the OS accessibility tree by selector instead of by pixel. That is what Terminator's MCP server gives Claude.

How it works: Screenshot in, pixel coordinate out, on a loop. The model sits in the inner loop of every single click.
Where you get it: A beta API tool (computer_20251022) for your own harness, plus a consumer research preview in the Claude apps on macOS and Windows (Pro or Max plan).
The faster path: Terminator's MCP server: 32 selector-based tools that resolve against the Windows UIA / macOS AX tree locally. No screenshot per click.

Two paths into your desktop.

Both use Claude. Only one sends a screenshot every turn.

Native computer use: screenshot in, pixel coordinate out

Terminator MCP: selector in, accessibility-tree match out

Same model. Same outcome. Different physics.

32 tools at server.rs:9953. No vision loop in the hot path.

One install: claude mcp add terminator

0:00 / 0:05

The short version

Claude computer use is a tool Anthropic exposes via the API. The tool's contract is simple and that is the whole problem: the model takes a screenshot of your desktop as input and emits actions in pixel coordinates as output. The client loop is screenshot, send, receive coordinate, execute, screenshot, send, receive, execute. Every cycle is one image upload and one model call.

On Windows and macOS, the OS already publishes a live accessibility tree that knows where every button, edit field, menu item, and checkbox is, what role it has, what its name is, and whether it is enabled. Terminator's MCP agent wraps that tree into 32 MCP tools. Claude calls them by selector. The click resolves locally through Windows UIA or macOS AX. No screenshot is required for the vast majority of actions, and the model is not in the critical path for element lookup.

You can run both at once. The interesting question is which one Claude reaches for first. When Terminator is attached, it should be the tree, not the screenshot.

What Claude actually emits, in both worlds

This is the most concrete way to see the difference. Same user intent ("click the Save button"), two tool calls, two completely different sets of downstream work.

Pixel coordinate. Your harness ships a fresh screenshot, Claude reads it, Claude returns an (x, y) in pixel space. You execute the click with xdotool, PyAutoGUI, or your own driver. The model never saw the button's name or role, only its pixels.

Screenshot required every turn
Model in the inner loop of element lookup
Coordinates break when the window moves 12 pixels
Replaying the run means replaying the screenshots

The tool calls, side by side

Left: the JSON Claude emits under Anthropic's computer_20251022 tool schema. Right: the JSON Claude emits when Terminator's MCP agent is attached. Both target the same button. Only one requires a new screenshot to find it.

tool_use payloads for 'click Save'

// What Claude emits when native computer use is enabled.
// Source: Anthropic computer_20251022 tool schema.
// The model sees a screenshot, picks pixels, returns this JSON.

{
  "type": "tool_use",
  "name": "computer",
  "input": {
    "action": "left_click",
    "coordinate": [487, 341]
  }
}

// Your harness must:
//   1. Screenshot the desktop
//   2. Ship it to Anthropic with the tool definition
//   3. Receive an action with [x, y] in pixel space
//   4. Execute the click (xdotool, PyAutoGUI, your own driver)
//   5. Screenshot again, send again, wait again

// Every click is one image upload and one model call.

10% lines, and one fewer screenshot per click

The native pixel loop, drawn once

Eight actors, ten messages. The model is in the middle of every action. This is why long tasks cost real money and real time on Claude computer use: every arrow on the right-hand side is a paid inference plus an image token charge.

native computer use: one click

The selector path, drawn as a beam

The MCP agent sits between Claude and the OS. Every tool call flows through a single dispatch_tool function. Selector in, accessibility-tree match out, action performed through UIA or AX. The model is not re-invoked to resolve the element.

Claude -> Terminator MCP -> OS accessibility tree

What the numbers look like

Three of these come straight from the repo. The fourth comes from Terminator's README claim, which is worth verifying yourself (it is the pitch of the project). All of them are checkable: the match arm count can be counted by reading server.rs, the line number is literally the line of the dispatch, and the concurrency default is in README line 45.

0Selector-based tools exposed to Claude

0Line of dispatch_tool in server.rs

0Total lines in server.rs

0xTarget speedup over pixel-loop agents

32 tools / 1 MCP call per click

“Claude can drive the OS without a screenshot in the inner loop. The selector resolves against the live UIA tree, in-process, at CPU speed.”

Terminator MCP agent, crates/terminator-mcp-agent/src/prompt.rs

dispatch_tool: the 32 tools Claude sees

This is the anchor fact for the page. Open crates/terminator-mcp-agent/src/server.rs at line 9953. There is one match tool_name block, each arm wires a tool name to an async Rust handler, and there are 32 named arms before the wildcard. Below is the shape of it, abbreviated so the names line up. Every one of these is what Claude can call when Terminator is attached.

crates/terminator-mcp-agent/src/server.rs

The rule the model is told, every session

Terminator's system prompt lives in src/prompt.rs. It starts by importing the compile-time tool list via env!("MCP_TOOLS") (populated by build.rs:31) and then it explicitly forbids the model from inventing selectors. This is the single most important line in the prompt, and the one that keeps Terminator from drifting back into a vision-style guess-and-check loop.

crates/terminator-mcp-agent/src/prompt.rs

Four everyday tasks, both ways

The easiest way to understand the latency delta is to think about what happens step-by-step for tasks a normal agent flow actually does.

Feature	Native computer use (pixel)	Terminator MCP (selector)
Opens an application	Claude must take a screenshot, find the taskbar icon pixel, click it, wait, screenshot again	open_application({ path: 'notepad' }) - single MCP call, returns the PID and the fresh UI tree
Fills a login form	Screenshot. Coord-click the email field. Type. Screenshot. Coord-click password. Type. Screenshot.	type_into_element({ selector: 'role:Edit && name:Email' }) twice - no vision loop
Reads a dialog	Screenshot, OCR inside the model, hope the text survived compression	get_window_tree returns the literal Name and Value strings from the accessibility API
Runs a multi-step workflow	Loop: screenshot -> LLM -> action -> screenshot -> LLM -> action ... Anthropic billed per turn.	execute_sequence ships a whole YAML of steps in one call. Engine-mode JS/Python share state via env.

The architectural contrast

Same LLM behind both. Different assumptions about where element lookup happens and what the model is expected to do with its tokens.

Feature	Anthropic computer tool	Terminator MCP
What the model returns per action	{ action: "left_click", coordinate: [x, y] }	{ name: "click_element", selector: "role:Button && name:Save" }
Input the model needs to see	PNG screenshot of the desktop, every turn	Accessibility tree (YAML/JSON), fetched once per screen
Round-trip cost per click	One screenshot upload + one model call	One MCP stdio call. Model already knows the tree.
Where the resolution happens	Inside the model: it reads pixels, does OCR-style vision, returns coords	Inside Terminator: selector is matched against the Windows UIA / macOS AX tree locally
Failure mode on UI drift	Button moved 12 pixels, old coordinate misses, silent miss-click	Selector by role+name still resolves if the element is still there. If not, a typed McpError comes back.
Observability	Screenshots are the only artifact. Replay is imprecise.	Every call logged by tool_logging.rs. UI tree before/after captured by default into executions/.
Cursor and keyboard	Takes over your cursor. You cannot use the computer while it runs.	Runs through accessibility APIs. Your cursor is untouched.
Session state	Whatever your harness script keeps	Long-lived MCP process. Cancellation tokens, concurrency gate (MCP_MAX_CONCURRENT), focus restore.

Install in Claude Code in one command

The MCP agent ships as a single npm package. Claude Code exposes an mcp add helper that wires it up with the right stdio plumbing.

terminal

What actually happens when Claude clicks Save

Trace one click through the whole stack. This is the selector path, step by step, with the files you can open yourself.

Claude emits a tool_use for click_element

The name field is "click_element". The input is { selector: "role:Button && name:Save" }. No coordinates. No screenshot attached.

MCP host forwards JSON-RPC tools/call over stdio

Claude Code, Cursor, or whatever MCP client you use, frames the call as JSON-RPC 2.0 on the stdout pipe of the npx-spawned terminator-mcp-agent process.

dispatch_tool matches the name at server.rs:9953

The "click_element" arm deserialises the arguments into ClickElementArgs and awaits self.click_element(..) under a tokio::select against the request's cancellation token.

The selector resolves against the UIA or AX tree

On Windows, terminator-rs calls into IUIAutomation. On macOS, AXUIElement. It walks children by role and name until it finds the match, respecting tree_max_depth (default 30).

The action fires through the accessibility API

invoke() is preferred over click() because it does not require the element to be in the viewport or to have stable bounds. The OS performs the native click event.

Terminator captures the UI diff

By default, the before/after tree is diffed and a screenshot is saved to executions/. ui_diff_before_after returns what changed, has_ui_changes returns a boolean Claude can check.

A CallToolResult returns up the stdio pipe

Structured result, captured stderr, timing. Claude sees the diff next turn. No new screenshot was needed to find the button.

Give Claude the selectors, keep computer use for the rest

Terminator is MIT-licensed. Install the MCP agent in Claude Code in one command. Keep Anthropic's computer tool for the fully alien UIs (games, canvas apps) where the accessibility tree is empty. Let the tree do the work everywhere else.

Read the source on GitHub →

Questions readers actually ask

How does Claude's native computer use actually work?

Anthropic ships a tool type called computer (current revision computer_20251022). When enabled, your harness is responsible for taking a screenshot of the desktop and sending it to Claude alongside the tool definition. Claude returns a tool_use block whose input is an action like left_click with a coordinate pair in pixel space. Your code executes the click, takes another screenshot, sends both back, and the loop continues. Every action is one screenshot upload and one model round-trip. This is not a limitation of Claude, it is how the tool is defined: the model sees pixels, you execute pixels.

Why is that loop expensive?

Two reasons, both mechanical. First, every action pays for one image token budget plus output tokens, and screenshots are not tiny even after downscaling. Anthropic's own computer use docs note that long tasks with frequent screenshots consume significant credits. Second, wall-clock latency: every action is one full inference pass, typically one to several seconds. A 40-action workflow becomes a coffee break. A deterministic selector-driven agent can do the same 40 steps in seconds because the model is not in the inner loop.

Is Claude computer use free, and what does it cost to run?

It is not free in either form. The beta API tool bills per token, and computer use is token-heavy by construction: every action sends a screenshot (image tokens) plus the model call (input and output tokens), so a 40-step workflow is 40 inference passes. Anthropic's own docs warn that frequent screenshots consume significant credits. The consumer 'Claude can use your computer' preview requires a Pro or Max subscription. Terminator's MCP agent is MIT-licensed and free, and because selectors resolve locally against the accessibility tree, it removes the per-click screenshot entirely, which is where most of the token cost in a pixel loop comes from. You still pay for the Claude calls that reason about the task, just not for an image upload on every click.

Is Claude computer use available on Windows, or only macOS?

Both. Anthropic ships computer use two ways. The beta API tool (computer_20251022) is OS-agnostic: it returns pixel coordinates and your harness executes them wherever it runs, Windows or macOS. The consumer research preview started on macOS and expanded to Windows in 2026, behind a Pro or Max plan. If you are on Windows specifically, the selector path is even stronger: Windows UI Automation exposes a deep, well-labelled tree for native and Win32 apps, so Terminator can resolve role:Button && name:Save without a single screenshot. Windows is Terminator's primary target, and the same MCP server also covers macOS via the AX API.

What exactly does Terminator's MCP server give Claude instead?

32 typed tools that speak accessibility-tree selectors, not coordinates. You can list them by opening crates/terminator-mcp-agent/src/server.rs at line 9953, where the dispatch_tool match block has one arm per tool. Examples: click_element takes a selector like role:Button && name:Save, type_into_element takes a selector plus text, navigate_browser drives the address bar, execute_browser_script runs JS inside the page, execute_sequence accepts a YAML of steps. The complete list at the time of writing: get_window_tree, get_applications_and_windows_list, click_element, type_into_element, press_key, press_key_global, validate_element, wait_for_element, activate_element, navigate_browser, execute_browser_script, open_application, scroll_element, mouse_drag, highlight_element, select_option, set_selected, capture_screenshot, invoke_element, set_value, execute_sequence, run_command, delay, stop_highlighting, stop_execution, gemini_computer_use, read_file, write_file, edit_file, copy_content, glob_files, grep_files.

Why selectors instead of coordinates?

Because the operating system already knows where every element is. Windows UI Automation and macOS Accessibility both expose a live tree where each element has a role (Button, Edit, Text, Window), a name, a value, bounds, and a parent chain. Terminator finds elements by matching that tree. role:Button && name:Save survives the button moving by 100 pixels, the window being resized, a theme change, or a DPI shift. A coordinate pair does not. The system prompt Terminator sends to Claude, in crates/terminator-mcp-agent/src/prompt.rs line 21, makes this explicit: always derive selectors strictly from the provided UI tree or DOM data; never guess or predict element attributes based on assumptions.

Can I use Claude's native computer use and Terminator together?

Yes. Terminator's MCP agent exposes both a capture_screenshot tool and a vision-model fallback called gemini_computer_use (server.rs line 10267) for cases where the accessibility tree is empty or lies. The normal flow is: Claude calls get_window_tree first, reads the tree, picks the selector it needs, calls click_element or type_into_element. If an element is truly invisible to accessibility, you can escalate to a visual path. The point is that the selector path is the default, not the exception, and most Windows and macOS apps expose enough tree to skip vision entirely.

How do I install Terminator's MCP server in Claude Code?

One command: claude mcp add terminator "npx -y terminator-mcp-agent@latest" -s user. That registers the server at the user scope, runs it over stdio under Claude Code's supervision, and exposes the 32 tools inside the normal tool picker. For Cursor, VS Code, Windsurf, or any other MCP client, the same server binary works with a standard mcpServers JSON block pointing at the same npx command. Full instructions live in crates/terminator-mcp-agent/README.md on GitHub.

Does Terminator replace Claude computer use or complement it?

It replaces the default path for the actions where accessibility tree is faster and more reliable, which is most of them on Windows and macOS. Terminator's README is explicit about the goal: run 100x faster than the pixel-loop agents and hit above 95% success rate by keeping the model out of the inner loop. Anthropic's native computer use still matters for fully alien UIs (games, some Electron apps that do not expose their tree, canvas-heavy apps), which is why Terminator keeps a vision-model tool available as a fallback rather than pretending it is never needed.

Is the system prompt really compiled into the binary?

Yes, and this is the detail worth looking at yourself. crates/terminator-mcp-agent/build.rs at line 31 defines extract_mcp_tools(), a build-time function that opens src/server.rs, scans for the let result = match tool_name line, and collects every subsequent "tool_name" =>. That list becomes the MCP_TOOLS environment variable via println!("cargo:rustc-env=MCP_TOOLS=..."). Then prompt.rs reads env!("MCP_TOOLS") at compile time and pastes it into the system instructions Claude receives. The practical consequence: the server cannot tell Claude about a tool that dispatch_tool does not handle, and Claude cannot see a stale tool list. They are the same list by construction.

Keep reading

Guide

What is an MCP server? A real one, opened in the editor

The dispatch_tool match block in server.rs, the build.rs trick that keeps the system prompt in sync with the code. Terminator is the example.

Read

Repo

Terminator on GitHub

Core Rust crates, MCP agent, Node and Python bindings, workflow recorder. MIT licensed.

Read

Docs

Terminator MCP agent README

Install commands for Cursor, VS Code, Claude Code. HTTP transport, concurrency gate, virtual display support for headless VMs.

Read

Claude computer use, and the selector-based path the articles skip

The short version

What Claude actually emits, in both worlds

The tool calls, side by side

The native pixel loop, drawn once

The selector path, drawn as a beam

Claude -> Terminator MCP -> OS accessibility tree

What the numbers look like

dispatch_tool: the 32 tools Claude sees

The rule the model is told, every session

Four everyday tasks, both ways

The architectural contrast

Install in Claude Code in one command

What actually happens when Claude clicks Save

Claude emits a tool_use for click_element

MCP host forwards JSON-RPC tools/call over stdio

dispatch_tool matches the name at server.rs:9953

The selector resolves against the UIA or AX tree

The action fires through the accessibility API

Terminator captures the UI diff

A CallToolResult returns up the stdio pipe

Questions readers actually ask

Keep reading

What is an MCP server? A real one, opened in the editor

Terminator on GitHub

Terminator MCP agent README

Comments (••)

Comments ()