Automation scripts for Windows, when one API covers every app

Most advice on this topic sends you to PowerShell (a sysadmin language) or AutoHotkey (a key-macro engine). Both are fine for what they are. Neither of them addresses the actual job: writing one script that drives a UWP Calculator, a WPF line-of-business tool, a Chromium-based Teams window, and a legacy Win32 dialog with the same selectors. Terminator does that by targeting the Windows UI Automation accessibility tree, with 24 locator primitives defined in one Rust enum and a 35-tool MCP agent so Claude Code can both author and run your scripts.

Matthew Diakonov, Written with AI

Published April 23, 20269 min read

4.9from design partners running this in production

24 locator primitives in crates/terminator/src/selector.rs

35 MCP tools in terminator-mcp-agent, one npx install

Drives Win32 + UWP + WPF + WinForms + Electron apps

Automation scripts for Windows, rethought

One selector language. Every desktop app. Your AI assistant runs it.

24 selector primitives, one Rust enum

role: / name: / nativeid: / chain with |

Same script: UWP Calculator to legacy LOB

35 MCP tools expose it to Claude Code

Record with workflow-recorder, replay deterministic

0:00 / 0:05

Where the existing playbooks leak

Walk through the common advice online. PowerShell gives you cmdlets, pipelines, and WMI queries. Perfect for services, registry edits, and system hygiene. You do not use it to click the Send button in Outlook. AutoHotkey compiles a tiny script into a global hotkey that can simulate keys and Win32 control clicks. Perfect for swapping Caps Lock to Escape or remapping a macro pad. You do not use it to drive a UWP Calculator or to work inside a Chromium-based app where the DOM is not a Win32 handle.

AutoIt adds window and control manipulation on top of key-simulated input. Task Scheduler runs your scripts at fixed times. All useful. None of it solves the actual problem of 2026: an AI assistant should be able to read the state of any window on your desktop, decide which control to touch, and touch it. The input surface has to be the accessibility tree, not a keystroke and not a pixel.

That is the shape Terminator takes. The selector engine (the core primitive) is a single Rust enum with 24 variants. The execution surface is a Playwright-style locator API in Rust, TypeScript, and Python. The AI surface is an MCP agent with 35 tools, served over stdio or HTTP, installable in one line.

The anchor fact: 24 primitives, one file

Open crates/terminator/src/selector.rs in the Terminator repo. The pub enum Selector at the top of the file declares every way a script can identify an element in a Windows app. Count the variants, skip the Invalid error variant, and you get 24. They are the entire vocabulary of a Terminator selector string, and every SDK parses into the same enum.

role:buttonname:Savenativeid:CalculatorResultsrole:edit|name:Addressrole:menuitem|name:Fileclassname:Editprocess:notepad.exerightof(name:Bold)has(role:button|name:Send)role:button && visible:true

The grammar: prefix with role: / name: / nativeid: / classname:, chain sub-selectors with |, combine with && / ||, anchor by spatial relation with rightof(), near(), and filter descendants with has() (Playwright-style :has). That is the full alphabet.

The shape, in numbers

Four numbers to keep in mind when you compare this to the older tooling. They come from the Terminator source, not a benchmark post.

0locator primitives in selector.rs

0MCP tools in terminator-mcp-agent

0UIA properties pre-fetched per node

0 mscached window-tree walk, 245 elements

A hotkey macro versus a selector script

Both scripts below open Calculator and compute 1 + 2. The left one is the AutoHotkey shape that pages on this topic still teach in 2026. It hardcodes pixel offsets and the window title, so a Windows 11 layout change or a high-DPI monitor kills it. The right one is the first example in examples/win_calculator.py in the Terminator repo. It targets semantic names through the accessibility tree, so the same script works on Windows 10, Windows 11, arm64, and a Citrix remote desktop.

Same calculator, two very different scripts

; autohotkey_v2.ahk
; Open Calculator and compute 1 + 2.
; The script hardcodes a pixel ordering plus
; a controlled window title, so it breaks the
; moment the Calculator layout changes.
Run "calc.exe"
WinWait "Calculator"
WinActivate "Calculator"

; Click (1) by relative coordinates.
ControlClick "x50 y180", "Calculator", , "Left"
Sleep 200
; Click (+)
ControlClick "x150 y180", "Calculator", , "Left"
Sleep 200
; Click (2)
ControlClick "x100 y180", "Calculator", , "Left"
Sleep 200
; Click (=)
ControlClick "x200 y220", "Calculator", , "Left"
Sleep 500

; Read the result via OCR on a screenshot.
; That part is not included here.

0% fewer lines

>95%

“Runs 100x faster than ChatGPT Agents, Claude, Perplexity Comet, BrowserBase, BrowserUse (deterministic, CPU speed, with AI recovery). >95% success rate unlike most computer use overhyped products.”

Terminator README, Why Terminator section

How an AI coding assistant actually runs this

Installing the MCP agent is one line per client. Once that is attached to Claude Code, Cursor, VS Code, Windsurf, or anything else that speaks MCP, the assistant that already writes your code also gets 35 tools for controlling every app on your desktop. This is the diagram of the loop.

Your script plus an MCP agent plus every Windows app

The MCP agent is a single npx-installable server. The tools match the locator API so the assistant does not need special prompting to use them. When a selector goes stale, the assistant calls get_window_tree, reads the JSON, and picks a new selector. The loop recovers without giving up the script.

Install, in a single terminal session

Pick the shape that matches the rest of your stack. Each of these installs points at the same underlying Rust core, so a script written in one ports cleanly to the others.

powershell / bash

From zero to a running script

Pick the app you want to drive

Calc.exe, notepad.exe, Excel, Chrome, or your internal line-of-business tool. Any Windows app that exposes an accessibility tree counts, which is almost every modern Windows app.

Dump the window tree once, read the JSON

desktop.get_window_tree(pid) returns the entire UIA subtree for the window. The cache request pre-fetches 7 properties per node (ControlType, Name, BoundingRectangle, IsEnabled, IsKeyboardFocusable, HasKeyboardFocus, AutomationId). About 200 ms for a 245-element window.

Write a selector-first script

Use role:, name:, and nativeid: where possible. Chain with the pipe. Avoid coordinates. Example: calc.locator("Name:Equals").first().click(). The same strings work from Python, TypeScript, and the MCP surface.

Run it through the MCP agent

claude mcp add terminator "npx -y terminator-mcp-agent@latest". Now Claude Code can call execute_sequence on your workflow file, patch failing steps from fresh window trees, and repeat until the assertion passes.

Record instead of writing, if you prefer

terminator-workflow-recorder captures your real desktop actions as JSON (mouse, keys, clipboard, focus changes). Double-click detection uses the standard 500 ms / 5 pixel thresholds. Convert the recording to a script, hand it to the same MCP loop, and you have an RPA bot without the legacy RPA cost.

What the 35 MCP tools actually do

Every one of these is defined as a #[tool(...)] attribute in crates/terminator-mcp-agent/src/server.rs. They are the same primitives an SDK script calls; the MCP agent just puts them on the wire so a language model can use them directly.

get_window_tree

Snapshot the full UIA subtree for a process as structured JSON. One call, cached in Rust.

click_element

Unified click. Three modes: selector, position inside bounds, or an existing element handle.

type_into_element

Smart clipboard optimization, handles long strings, falls back to per-key simulation when needed.

press_key

Normalized chords (Ctrl+S, Alt+F4, F11) sent to a focused element instead of the OS globally.

execute_sequence

Run a whole workflow or a step range. Supports resume, rollback, and step-level variables.

navigate_browser

Open a URL inside a browser window the SDK already knows how to locate.

open_application

Launch calc.exe, notepad.exe, or a UWP target like uwp:Microsoft.WindowsCalculator.

validate_element

Assert an element exists before you act. Returns diagnostics when the selector is wrong.

glob_files / grep_files

Let the LLM read your YAML workflows and TypeScript steps inside the working directory.

A workflow, not a macro

The @mediar-ai/workflow SDK wraps steps, state, and error recovery around raw calls. Each step is a typed function with execute and onError, sharing context through a Zod-typed object between steps. This is how you write a Windows automation script that survives flaky production conditions without turning into a pile of try / except.

workflow.ts

Record once, replay deterministically

If you prefer not to write the script by hand, there is a workflow recorder. It lives at crates/terminator-workflow-recorder and captures mouse, keyboard, clipboard, hotkeys, text input completion, and UI focus / property / structure changes. Double clicks are detected with the Windows-standard 500 ms time window and a 5-pixel distance tolerance (see the README for the tracker tests). The output is timestamped JSON with the full accessibility-tree metadata for each interaction, so the replay uses selectors, not coordinates.

record_workflow.rs

The config lets you filter noise from system UI: clock updates, taskbar, notifications, dwm.exe, and explorer.exe are all ignorable out of the box via the ignore_applications and ignore_focus_patterns fields. Once the recording is saved, you can feed the JSON back through the same MCP tools that would run a hand-written script.

Three numbers worth memorizing

Selector variants in the Selector enum at crates/terminator/src/selector.rs. One vocabulary for every framework.

MCP tool handlers in terminator-mcp-agent/src/server.rs. Every one is a function your AI assistant can call directly.

0 cmd

claude mcp add terminator "npx -y terminator-mcp-agent@latest". That is the entire install for Claude Code.

Watch one script execute, in three frames

open Calculator, compute 1 + 2, read the result

01 / 03

Frame 1: open_application

The script calls desktop.open_application("calc.exe"). Rust launches the process and returns a UIElement for the root window. No pixel hunting, no title regex. If the UWP identifier uwp:Microsoft.WindowsCalculator is available, the example prefers it and falls back to calc.exe.

Terminator versus the traditional tooling

Feature	AutoHotkey / AutoIt / PowerShell	Terminator
Targets accessibility tree (role, name, AutomationId)	No (keys and pixels)	Yes, 24 locator primitives in selector.rs
Works on Win32 + UWP + WPF + WinForms + Electron	Partial (AHK struggles with UWP and Chromium)	Yes, anything exposing UI Automation
One script shape for every app	No, separate idioms per app	Yes, locator("...").first().click()
AI coding assistant can author and run it	No, AHK/AutoIt have no MCP surface	Yes, 35-tool MCP agent (one npx install)
Deterministic replay of recorded workflows	AHK macros record keys; fragile to layout	workflow-recorder emits JSON, replayable
Recovery when a selector goes stale	Script fails hard	LLM gets a fresh window tree, picks a new selector
Runs from Rust, TypeScript, Python, or YAML	AHK: one language. PowerShell: one language.	4 SDKs + MCP + CLI
Open source license	Mixed	MIT, no lock-in

Want us to wire your Windows workflow into Claude Code?

Book 20 minutes and we will turn one of your existing AutoHotkey or PowerShell scripts into a selector-based Terminator workflow on a real app.

Frequently asked questions

How is Terminator different from PowerShell or AutoHotkey for Windows automation?

Different target surface. PowerShell drives the OS through cmdlets and .NET, which is great for registry edits, services, and file operations, but it has no first-class handle on a UWP button or a WPF combo box. AutoHotkey drives pixels, keystrokes, and Win32 handles, which works for macros over desktop apps but breaks as soon as layout shifts. Terminator scripts target the Windows UI Automation accessibility tree through a locator API modeled on Playwright. The same script shape finds a button in Notepad, Excel, Chrome, VS Code, and a line-of-business Electron app. The 24 locator primitives are defined as a single Rust enum at crates/terminator/src/selector.rs.

What exactly are the 24 selector primitives?

Role, Id, Name, Text, Path, NativeId (the Windows AutomationId), Attributes, Filter, Chain, ClassName, Visible, LocalizedRole, Process, RightOf, LeftOf, Above, Below, Near, Nth, Has (Playwright-style :has), Parent, And, Or, Not. You can chain any of them with the pipe character inside the locator string, and combine them with logical operators like && and ||. The Rust enum also has an Invalid variant used to carry parse errors, which is why you may see 25 variants when reading the source.

Can my AI coding assistant run these scripts without me writing extra glue?

Yes. Terminator ships an MCP server (terminator-mcp-agent) that exposes 35 tools (#[tool(...)] attributes in crates/terminator-mcp-agent/src/server.rs). Claude Code, Cursor, VS Code, Windsurf, and anything else that speaks MCP can call get_window_tree, click_element, type_into_element, press_key, open_application, navigate_browser, execute_sequence, and the rest. You add it with one line: claude mcp add terminator "npx -y terminator-mcp-agent@latest". Now the same assistant that writes your code can also run it against any app on the desktop.

Does this work on Win32, UWP, WPF, WinForms, and Electron?

Yes, because every one of those frameworks implements Microsoft UI Automation. The accessibility tree is how screen readers understand these apps, so it is also how Terminator understands them. The win_calculator.py example in the repo targets a UWP Calculator control (nativeid:CalculatorResults), and the notepad.py example handles both Windows 10 and Windows 11 Notepad by switching on platform.release(). You can use the same role:, name:, and nativeid: selectors against Chrome (Electron), Visual Studio (WPF), Paint (Win32), and File Explorer (UWP shell) without changing the API.

What about workflows that need to be replayed exactly, like RPA?

Terminator includes a workflow recorder (crates/terminator-workflow-recorder) that captures mouse, keyboard, clipboard, hotkeys, text-input completion, and UI focus/property/structure changes as a timestamped JSON event stream. Double clicks are detected with the Windows-standard 500 ms time threshold and a 5-pixel distance tolerance. The JSON saves to a file and can be converted back into a script for deterministic replay. That means you record once in your real desktop, then run the replay through the same MCP loop and let the LLM patch the step that broke, rather than re-recording the whole workflow.

Is the script deterministic or does it rely on the LLM at runtime?

Deterministic first, LLM on recovery. A Terminator script is normal code: pick selectors, call click(), type_text(), press_key(), assert what happened. It runs at CPU speed with no model inference on the hot path. The AI only enters when the script would otherwise fail, for example a selector went stale because a label changed. In that case the script can call get_window_tree to dump the fresh accessibility tree as JSON, hand it to an LLM, and ask for a replacement selector, then retry. This is the pattern the Terminator README describes as >95% success, 100x faster than pure computer-use agents.

Which SDKs can I write scripts in?

Rust (terminator-rs), TypeScript (@mediar-ai/terminator), Python (terminator-py, Partial), and a workflow SDK (@mediar-ai/workflow) for step-based, typed workflows with error recovery. There is also a CLI (@mediar-ai/cli) for running workflow YAML or TypeScript from the command line, and a KV package (@mediar-ai/kv) for sharing state between steps. The npm package that spins up the MCP server is terminator-mcp-agent, and everything lives under crates/ and packages/ in the Terminator repo.

Do I need to inspect the accessibility tree myself to find selectors?

Only if you want to. Two paths. Manually, use Accessibility Insights for Windows or inspect.exe (from the Windows SDK) to hover over a control and read its Name, ControlType, and AutomationId. Programmatically, call desktop.get_window_tree(pid), which returns the entire UIA subtree rooted at a window in one cached call. For a mid-size 245-element window this takes about 200 ms on the Rust path (build_tree_with_cache at crates/terminator/src/platforms/windows/tree_builder.rs line 386). Print the tree, hand it to Claude, and ask it to pick a selector. That is the agent-native way to find selectors at runtime.