Automation for Windows your AI coding assistant can actually drive

Every other guide to automation for Windows walks you through a drag-and-drop RPA canvas or an AutoHotkey snippet meant for a human keyboardist. This page is about the other use case: giving Claude Code, Cursor, or Windsurf a typed MCP surface to drive the real Windows accessibility tree, and having every click tell the agent what changed.

Matthew Diakonov, Written with AI

Published April 20, 202610 min read

4.9from dozens of design partners

35 MCP tools registered in crates/terminator-mcp-agent/src/server.rs

Every action tool returns a UI tree diff via ui_diff_before_after:true

Selectors built on IUIAutomation, not pixels

Automation for Windows, agent-shaped

A typed MCP surface over the UI Automation tree

35 tools registered via #[tool(...)] in server.rs

click_element, type_into_element, press_key, invoke_element, and 31 more

Every action accepts ui_diff_before_after:true

Response returns the changed nodes in compact YAML

The agent sees what happened without a new screenshot

0:00 / 0:05

Two different readers typed the same search

A developer searches for "automation for Windows" because they want to script a daily task, deploy a workflow across a fleet, or let an AI coding assistant take care of the parts of work that happen outside the text editor. A business analyst searches the same phrase because a vendor called Power Automate is up for renewal.

The business analyst has endless options. UiPath, Automation Anywhere, Power Automate Desktop, TinyTask, Thunderbit. All canvases. All designed to record human clicks and play them back. The developer reads those pages and bounces, because they do not want a canvas. They want an API.

Terminator is the API. It is a Rust core with Node, Python, and MCP bindings, shaped like Playwright, targeting every Windows application instead of a browser. And the newest binding, the one most pages on this keyword ignore, is the MCP agent that exposes 35 typed tools to whichever AI assistant you already use.

The anchor fact the other pages miss

Terminator ships a single MCP binary, terminator-mcp-agent, that any MCP-aware editor can attach to with one line. Inside that binary, crates/terminator-mcp-agent/src/server.rs registers exactly 35 tools. Every tool that performs an action accepts an optional ui_diff_before_after parameter. When set, the tool captures the target window's UIA tree before the action, runs the action, captures the tree again, and returns only the semantic delta.

0MCP tools registered in server.rs

0%deterministic task success rate

0xfaster than vision agents

0line to add to Claude Code

35 tools

“Use ui_diff_before_after:true to see changes (no need to call get_window_tree after).”

Tool description string for type_into_element in crates/terminator-mcp-agent/src/server.rs at line 2152

Wire it into Claude Code

The agent talks to the MCP binary over stdio. You paste one entry into the MCP config and restart. There is no separate daemon, no service to install, no Azure portal to log into.

.cursor/mcp.json

The one-liner for Claude Code is claude mcp add terminator "npx -y terminator-mcp-agent@latest". The same binary answers HTTP on request, with /health, /status, and /mcp endpoints and a tunable MCP_MAX_CONCURRENT limit if you want to run multiple agents on one host.

What the agent actually sends

A user prompt like "save this Word document" turns into a single MCP call. The agent does not loop back to read the window tree. The tool call itself returns the diff.

agent-call.json

The diff, stripped of noise

UIA trees contain volatile runtime IDs and bounding rectangles that drift on every repaint. A naive diff would flood the agent with cosmetic changes. Terminator preprocesses both snapshots in crates/terminator/src/ui_tree_diff.rs, stripping id, element_id, and bounds fields. What the agent sees is only what a human would notice.

click_element response

The 35 tools, by role

The MCP surface is small on purpose. One tool per verb. A short selection of the action tools every agent ends up calling.

click_element

Left, right, double. Accepts a UIA selector string or an index into the last returned tree. Returns the before/after diff when you pass ui_diff_before_after:true.

type_into_element

Clipboard-optimised typing with verification. Trailing {Enter}, {Tab}, or {Escape} are auto-detected so a search box prompt reads 'search query{Enter}'.

press_key

Keys normalize to curly-brace form. 'Ctrl+A' becomes '{Ctrl}a'. Fires against the focused element; press_key_global fires against the OS.

invoke_element

UIA InvokePattern invocation. Works on offscreen and minimized elements where a literal click would fail. The fallback path for buttons that reject mouse events.

get_window_tree

Structured snapshot of a process's accessibility tree. Supports include_browser_dom for Chrome DOM, include_ocr for vision text, include_omniparser for icons.

wait_for_element

Waits for a selector condition (visible, enabled, exists) with an explicit timeout. The Playwright-shaped primitive for race conditions in desktop UIs.

execute_sequence

Runs a YAML or JSON workflow file with conditional jumps, retries, and engine-mode JavaScript or Python steps. The MCP tool that turns a prompt into a replayable script.

capture_screenshot

Real screenshot for when the accessibility tree truly is not enough. Rare fallback. The other 34 tools handle the rest through structured data.

Works across the editor ecosystem

The MCP binary is a transport-level adapter. Any editor that speaks the Model Context Protocol can load it. These are the editors the Terminator README documents out of the box.

terminatormcp-agent

Claude Code

Cursor

VS Code

Windsurf

Continue

Zed

The tools this page is not about

Every name on this strip is a fine product. None of them ship as an MCP server. None of them return a before/after UI tree diff. If the reader you are is a business analyst picking a canvas, one of these is likely right for you. If you are a developer shipping an AI workflow, keep reading.

Power Automate DesktopUiPathAutomation AnywhereAutoHotkey v2AutoItSikuliXpywinautoTask SchedulerWinAutomationTinyTaskThunderbit

Feature by feature

Feature	Traditional automation for Windows	Terminator
Exposes the desktop to AI agents as MCP tools	No	35 tools via terminator-mcp-agent
Selector language for Windows controls	Drag-and-drop or hotkey only	role:/name:/id:/process: with && \|\| !
Action returns a UI tree diff	No, re-screenshot	ui_diff_before_after on every action tool
Uses accessibility API, not pixels	Mixed or coordinate-based	IUIAutomation COM API end to end
Code-first SDKs	Canvas or script DSL	Rust, TypeScript, Python, MCP
Open source license	Proprietary or restricted	MIT on GitHub at mediar-ai/terminator
Works inside a scheduled CI job	Requires attended desktop	Headless mode via TERMINATOR_HEADLESS env var

Five steps to your first agent-driven desktop action

Three happen inside Terminator. Two you type yourself. No canvas, no recorder to rewatch.

Install the MCP agent into your editor

claude mcp add terminator "npx -y terminator-mcp-agent@latest" for Claude Code. Paste the same JSON into ~/.cursor/mcp.json for Cursor, or the Windsurf and VS Code equivalents.

Prompt your assistant with a desktop task

"Open Excel, paste this formula into cell B2, save as report.xlsx." The assistant routes the verbs to terminator-mcp-agent's 35 tools instead of trying to imagine a screenshot.

The tool call fires against IUIAutomation

click_element resolves the selector "process:EXCEL >> role:Edit && name:Formula Bar" through the UIA COM API, calls ElementFromPoint or InvokePattern as appropriate, and drives the real control.

The diff comes back inline

Because the tool call was made with ui_diff_before_after:true, the MCP response includes the added, removed, and modified nodes after preprocess_tree strips volatile IDs and bounds.

The agent decides the next call

Reading only the diff, the assistant knows whether a dialog opened, whether a progress spinner appeared, or whether the Save button greyed out. It fires the next tool. No extra get_window_tree round trip.

Verify the anchor claims against source

The whole of this page is grep-verifiable. Clone the repository and run the same commands the AI coding assistant would.

zsh

MCP tools registered in server.rs. One binary, one one-line install, the whole Windows accessibility surface.

Preprocessing passes in ui_tree_diff.rs: strip IDs from JSON, strip IDs and bounds from compact YAML.

Extra get_window_tree round trips the agent has to make after an action with ui_diff_before_after:true.

Install

The MCP binary is published as terminator-mcp-agent on npm and ships prebuilt Windows binaries. No Rust toolchain required on the host.

install

Want your AI coding assistant driving Windows apps by Friday?

Book 20 minutes and we will wire Terminator's 35 MCP tools into your editor against a real Windows workflow.

Frequently asked questions

What makes Terminator different from Power Automate Desktop or AutoHotkey as automation for Windows?

Power Automate Desktop is a low-code canvas for human operators to click together. AutoHotkey is a hotkey and scripting language. Terminator is a developer framework, closer to Playwright for the whole OS. You write code (Rust, TypeScript, or Python) or you let an AI coding assistant call its MCP tools directly. Selectors are UIA-based strings like role:Button && name:Save, scoped by process. No drag-and-drop canvas, no vendor lock-in, MIT licensed.

How many MCP tools does Terminator ship?

Exactly 35, registered via #[tool(...)] attributes inside crates/terminator-mcp-agent/src/server.rs. They include click_element, type_into_element, press_key, press_key_global, invoke_element, activate_element, mouse_drag, scroll_element, select_option, set_selected, set_value, capture_screenshot, get_window_tree, get_applications_and_windows_list, navigate_browser, open_application, validate_element, wait_for_element, highlight_element, stop_highlighting, hide_inspect_overlay, delay, ask_user, gemini_computer_use, execute_sequence, read_file, write_file, edit_file, copy_content, glob_files, grep_files, typecheck_workflow, stop_execution. Grep the source: grep -c '#\[tool(' crates/terminator-mcp-agent/src/server.rs returns 35.

Why does every action tool accept a ui_diff_before_after parameter?

When an AI agent clicks a button, it needs to know what changed. The naive approach is to re-fetch the full window tree after every action, which is slow and floods the agent context with unchanged nodes. Terminator snapshots the tree before the action, runs the action, snapshots the tree after, and returns only the diff. The tool description strings literally tell the model: 'Use ui_diff_before_after:true to see changes (no need to call get_window_tree after).' See server.rs line 2152 for the type_into_element tool's description.

How is the UI tree diff stabilized so it does not flood with noise?

UIA trees contain volatile identifiers like runtime ids and bounding rectangles that shift on every repaint, so a naive diff would surface dozens of cosmetic changes for every click. Terminator strips those fields in crates/terminator/src/ui_tree_diff.rs. preprocess_tree removes id and element_id from the JSON form. remove_ids_and_bounds_from_compact_yaml strips '#abc123' identifiers and 'bounds: [x,y,w,h]' tuples from the compact YAML form. After preprocessing, a text diff via the similar crate shows only semantic changes.

Why use structured accessibility APIs instead of screenshots plus vision AI for desktop automation?

Speed and determinism. Per Terminator's llms.txt, UIA-based automation runs at CPU speed rather than LLM inference speed, which is roughly 100x faster, and it succeeds over 95 percent of the time on deterministic tasks. Screenshot-plus-vision agents have to re-infer the same layout on every step, drift when icons change, and cost money per action. Terminator reserves AI for error recovery, not every click.

How do I give Claude Code access to this automation for Windows?

One command. Run claude mcp add terminator 'npx -y terminator-mcp-agent@latest' in a terminal and Claude Code registers all 35 tools. The agent has two identical one-liners in the MCP config for Cursor, VS Code, and Windsurf, documented in the mediar-ai/terminator README. After that, any prompt like 'open Excel, paste this formula in cell A1, save the file' resolves through the MCP tools instead of screenshots.

Does Terminator work on Windows 10 or only Windows 11?

Both. The Windows backend talks to the UI Automation COM API (IUIAutomation, introduced in Windows 7 and stable across every shipping version since). The MSI installer and prebuilt npm binary target Windows 10 and 11, both x64 and ARM64. The Rust core also has macOS and Linux backends (AX API and AT-SPI2), though the npm, pip, and MCP packages currently ship Windows binaries only.

Can the agent verify its click landed without a screenshot?

Yes. The ui_diff_before_after response includes the set of added, removed, and modified nodes between the two tree snapshots, rendered in the compact YAML form after ID and bounds stripping. If the click opened a dialog, the diff contains the new [Window] and its children. If the click was eaten by a disabled control, the diff is empty. The agent decides what to do next from that text alone.