Matthew Diakonov, Written with AI

Published April 21, 202611 min read

Automation tools for Windows: why the cache-batched accessibility tree beats screenshots, coordinates, and RPA studios

Every “top 10 automation tools for Windows” article recommends the same six products: Power Automate Desktop, UiPath, AutoHotkey, WinAutomation, Blue Prism, and a task scheduler. None of them explains why the Windows UI Automation tree, fetched in one CacheRequest round-trip, is the only approach fast and stable enough for an AI coding assistant to drive your desktop in a loop. This page does.

4.9from Built on Microsoft UI Automation, MIT licensed, audited by real agents

One CacheRequest, seven batched UIProperty values

Typed SDKs in Rust, Node.js, and Python

MCP server with fifteen tools for AI coding assistants

Automation tools for Windows

Why the accessibility tree, batched, wins

Windows apps expose an accessibility tree.

Most tools read it one property at a time.

Terminator asks for the whole subtree in one call.

Seven UIProperty values come back batched.

Your agent reads every element locally, with zero COM cost.

0:00 / 0:05

The shortlist everyone links to, and what it misses

If you search “automation tools for Windows”, the first page of results is a stack of vendor roundups that recommend effectively the same six products. They are real tools and they solve real problems, but every one of them is optimised for a human in a studio: drag an activity onto a canvas, press record, pick an element from a visual picker. The problem is that the job you actually need a Windows automation tool for in 2026 is different. You want an AI coding assistant to read the screen, pick the right element, and act on it, in a loop, without a human in the middle.

What the listicles cover

Fixed RPA studios: Power Automate, UiPath Studio, Blue Prism
Scripting tools: AutoHotkey, AutoIt
Python UIA wrappers: pywinauto, uiautomation
Screenshot or OCR agents: SikuliX, pixel-diff bots
Test drivers: WinAppDriver, Appium Windows, Coded UI (retired)
Task schedulers: Windows Task Scheduler, cron-style runners

Six categories, zero of them designed to be called from a typed SDK by Claude Code, Cursor, or Codex. That is the gap. Terminator fills it, and the reason it can is an implementation detail most automation coverage never touches: how the tool talks to Windows in the first place.

The anchor: one CacheRequest, seven properties, a whole subtree

Microsoft UI Automation (UIA) is the accessibility API that backs almost every modern Windows control: WPF, WinUI, UWP, WinForms, Electron apps, the Office suite, File Explorer, the Start menu. Screen readers use it. Accessibility inspectors use it. It exposes every UI element with a role (ControlType), a name, a bounding box, and dozens of other properties. If you read it well, you do not need pixels and you do not need coordinates.

The catch is that UIA is cross-process. Your automation process and the target application are different processes, and every property lookup is a COM call across that boundary. Naive wrappers do one call per property per element. A mid-sized window has a few hundred elements and you usually need six or seven properties each. That is thousands of round-trips before your agent has decided what to click.

Terminator takes the documented solution and wires it into its tree builder by default. The file is crates/terminator/src/platforms/windows/tree_builder.rs. The function is build_tree_with_cache. This is the anchor fact for this whole page.

crates/terminator/src/platforms/windows/tree_builder.rs

Seven properties, one scope flag (TreeScope::Subtree is value 7: Element + Children + Descendants), one find_first_build_cache. After that, get_cached_name, get_cached_control_type, get_cached_bounding_rectangle, and the rest are all local memory reads. A counter in TreeBuildingContext named cache_hits tracks how many times the hot path took the local read.

1 IPC call

“[CACHED_TREE] is the log line you watch. Seven properties for an entire subtree in one round-trip. Every subsequent read is free.”

crates/terminator/src/platforms/windows/tree_builder.rs

What the typical library does instead

To see why the CacheRequest matters, compare it to the default pattern most Windows automation libraries use when you access a property. Each getter is its own round-trip.

sketch: naive UIA wrapper

That cost scales with the size of the UI, not the complexity of the task. Once a window has 300 elements, you are spending most of your clock budget on IPC, not on logic. If an AI agent is driving that library in a loop, every iteration pays the same tax.

0 msExcellent build budget for a small window

0 msGood build budget for a mid-sized window

0UIProperty values batched per CacheRequest

0Scenarios covered by the benchmark suite

How a tool call flows through the system

When your AI coding assistant calls an MCP tool, here is what happens underneath. Notice where the expensive boundary is and how many times it gets crossed.

Agent -> MCP -> Rust -> UIA

Two requests cross into the Windows UIA boundary: the cache request build, and the single find_first_build_cache. Everything after that is local. For comparison: a naive wrapper would have one arrow per property per element going right, and one going back.

Terminator versus the listicle shortlist

The usual suspects all do something real. Where they fall short, specifically, is what an AI coding assistant needs: a typed SDK, a structured view of the screen, and a cheap per-iteration cost.

Feature	Typical Windows RPA or scripting tool	Terminator
How it sees the screen	Pixels, OCR, or coordinates	Microsoft UI Automation accessibility tree
How many IPC calls per tree build	One per property per element (N times 6+)	One find_first_build_cache, then local reads
Selector stability across themes and DPI	Breaks on theme or font changes	Role, Name, AutomationId stay the same
Primary interface	Drag-and-drop studio or record-and-replay	Typed SDK (Rust, Node, Python) plus MCP server
Fit for AI coding assistants	No, needs a human in the studio	Yes, Claude Code and Cursor call MCP tools directly
Works with your current Windows session	Often boots an isolated robot session	Runs in your logged-in user session
License	Per-bot or per-studio seat	MIT, mediar-ai/terminator on GitHub
Browser scripting from the same tool call	No, separate browser product	execute_browser_script via chrome.debugger

How an AI coding assistant reaches every Windows app

The value of a fast tree builder is that it unlocks calling pattern: agent sends intent, tool returns a tree, agent picks a locator, tool runs the action. Same pattern for every Windows surface you might want to automate.

One framework, every Windows surface

What it looks like from your side

The SDK surface stays small. You open an application, you locate elements by role and name, you act. The CacheRequest detail is in the Rust core below you; you do not have to know about it to benefit from it.

example.ts

The pieces that make it work

Six things come together. The cache batch is the headline, but the rest of the system matters too.

One CacheRequest, whole subtree

build_tree_with_cache creates a single IUIAutomationCacheRequest, adds seven UIProperty values, sets TreeScope::Subtree, and fetches the element plus every descendant in one COM round-trip. Source: crates/terminator/src/platforms/windows/tree_builder.rs, line 398 onward.

Seven batched properties

ControlType, Name, BoundingRectangle, IsEnabled, IsKeyboardFocusable, HasKeyboardFocus, AutomationId. Everything an agent needs to decide what to click is pre-loaded before the locator chain runs.

TreeScope::Subtree (value 7)

Element (1) plus Children (2) plus Descendants (4) combined. One scope flag is the difference between walking the tree live and reading it from local memory.

yield_every_n_elements = 50

Default setting in TreeBuilderConfig. Every 50 elements the walker sleeps 1 ms so the target app's UI thread can service its own message pump. No freezing Excel while you index it.

get_cached_* reads

After the cache build, every descendant property is read locally with get_cached_name, get_cached_control_type, and so on. A counter in TreeBuildingContext tracks cache_hits you can log at the end of a workflow.

MCP agent, fifteen tools

terminator-mcp-agent exposes get_window_tree, click_element, activate_element, validate_element, execute_sequence, execute_browser_script, and ten more. Each tool is a typed function an AI coding assistant can call over MCP.

The step-by-step, in order

Your agent issues an MCP call

Claude Code or Cursor sends a JSON-RPC message like click_element { selector: { role: 'Button', name: 'Save' } } to the terminator-mcp-agent binary.

The Rust core builds a cached tree

build_tree_with_cache runs. One create_cache_request, seven add_property calls, TreeScope::Subtree, then find_first_build_cache. A log line starting with [CACHED_TREE] tells you how long it took.

The selector engine walks the local cache

Role, Name, and (if needed) AutomationId match against the cached subtree. No new COM calls. Name is tried first, AutomationId only when name is empty.

The action runs against the live element

Click, type, focus, or activate fire through the real COM handle. The cached tree is strictly for discovery; the live handle is used for the action itself.

The MCP response carries the result

Success, failure, or structured return value flows back through JSON-RPC to the AI assistant, which writes the next step of the workflow.

Verify it yourself

This is all public and MIT licensed. If you want to confirm the CacheRequest pattern is real before you install anything, a shallow clone and three greps are enough.

Verify the cache-batched tree build

The [CACHED_TREE] log line is the timing you can trust. Run Terminator against your own Windows apps at RUST_LOG=info and you will see how long the one-shot subtree build took before a single locator ever ran.

The takeaway for choosing a Windows automation tool

If a human is driving the automation, an RPA studio is fine. If you want an AI coding assistant to drive it, you need four things the studios do not give you: a typed SDK, a structured tree instead of pixels, a cheap per-iteration cost, and an MCP server the assistant can already speak to. Terminator ships all four, and the CacheRequest-batched tree build is what keeps the cost cheap enough to loop.

Want to see it drive your own Windows workflow?

Hop on a call and we will run Terminator against the exact app your team is trying to automate.

Frequently asked questions

Why pick a developer framework over Power Automate Desktop or UiPath Studio?

Because AI coding assistants write code, not flowchart tiles. Power Automate Desktop and UiPath Studio expect a human to drag activities onto a canvas. An agent driving your desktop needs a typed SDK it can import, call, and test. Terminator is a library (Rust core with Node.js and Python bindings) plus a Model Context Protocol server, so Claude Code, Cursor, and Codex can call functions like locator('Edit', 'Address').click() the same way they already call Playwright.

What makes the accessibility tree faster than screenshots or coordinates?

Two things. First, the Windows UI Automation (UIA) API returns structured data. You get role, name, AutomationId, and bounding box as text, not pixels, so there is nothing to OCR and nothing to re-detect after a theme change. Second, Terminator fetches those properties in bulk. In crates/terminator/src/platforms/windows/tree_builder.rs the build_tree_with_cache path creates one IUIAutomationCacheRequest, calls add_property for seven properties (ControlType, Name, BoundingRectangle, IsEnabled, IsKeyboardFocusable, HasKeyboardFocus, AutomationId), sets the tree scope to Subtree, then issues a single find_first_build_cache. After that, every descendant is read locally via get_cached_* with zero further COM round-trips.

Is the UIA CacheRequest really a single round-trip?

Yes. IUIAutomation::FindFirstBuildCache is a documented Microsoft API that performs the cross-process walk once and returns a detached cached tree. Without it, each property access is its own COM call from your process to the target app, and Windows accessibility is cross-process IPC. A mid-sized window has hundreds of elements and seven properties per element; that is potentially thousands of IPC calls. The CacheRequest collapses that to one call plus local reads. Terminator logs this phase with [CACHED_TREE] in the info level so you can verify the timing yourself.

What selector strategies does Terminator support on Windows?

The three that map directly to UIA: ControlType (role), Name (accessible name), and AutomationId (the stable developer-assigned ID, when present). The engine tries name match first and only loads AutomationId when name is empty, which keeps the common path cheap. Selectors chain, so locator({ role: 'window', name: 'Notepad' }).locator({ role: 'edit' }) narrows scope at each step instead of walking the whole tree again.

How does Terminator avoid freezing the UI thread while walking the tree?

The tree builder has a yield_every_n_elements field, defaulted to 50, that inserts a 1 ms sleep after every N elements processed. That lets the target app's UI thread service its own message pump while Terminator walks the subtree. Screenshot-based agents do not hit this problem because they do not touch the UI thread, but they also do not know what they are clicking.

Does the MCP agent expose the same primitives?

Yes. The terminator-mcp-agent binary registers fifteen tools that an AI coding assistant can call directly, including get_window_tree, click_element, activate_element, validate_element, open_application, and execute_sequence. These are the same primitives the SDK exposes, so the agent and your own code share a model. You can also hand a TypeScript snippet to execute_browser_script and run it inside the active Chrome tab via chrome.debugger, which makes a single tool call span the desktop and the web.

What performance budget should I expect?

Terminator's own benchmark file (crates/terminator/src/platforms/windows_benchmarks.rs) defines thresholds for 26 scenarios covering Calculator, Notepad, Excel, Chrome, GitHub, Amazon, YouTube, and Slack. Excellent is 50 ms or less and up to 50 elements. Good is 51 to 150 ms and up to 150 elements. Fair is 151 to 300 ms and up to 300 elements. Anything over 300 ms is Poor. The throughput metric the benchmarks report is elements per millisecond.

How does this compare to AutoHotkey or pywinauto?

AutoHotkey is a scripting language that sends keyboard and mouse events. It is fast and ubiquitous, but it does not give an AI agent a structured view of the screen. Pywinauto is Python-only and also speaks UIA, but it does not use CacheRequest by default; each property access is a separate COM call. Terminator is cross-language (Rust core, Node and Python bindings), MCP-native, and wraps the cache pattern for you.

Can I verify the CacheRequest code myself?

Clone the repo and look at crates/terminator/src/platforms/windows/tree_builder.rs. The function is build_tree_with_cache. The property list sits in a [UIProperty; 7] array around line 404. The find_first_build_cache call is around line 436. Run an example workflow with RUST_LOG=info and you will see a log line starting with [CACHED_TREE] that tells you how long the build took.

Is Terminator open source and MIT licensed?

Yes. The repo is mediar-ai/terminator on GitHub. Crates, extension, MCP agent, and the Node and Python bindings are all MIT. You can read the exact COM initialization path, the selector engine, and the extension bridge without signing an enterprise contract.