Automation tools for Windows: why the cache-batched accessibility tree beats screenshots, coordinates, and RPA studios
Every “top 10 automation tools for Windows” article recommends the same six products: Power Automate Desktop, UiPath, AutoHotkey, WinAutomation, Blue Prism, and a task scheduler. None of them explains why the Windows UI Automation tree, fetched in one CacheRequest round-trip, is the only approach fast and stable enough for an AI coding assistant to drive your desktop in a loop. This page does.
The shortlist everyone links to, and what it misses
If you search “automation tools for Windows”, the first page of results is a stack of vendor roundups that recommend effectively the same six products. They are real tools and they solve real problems, but every one of them is optimised for a human in a studio: drag an activity onto a canvas, press record, pick an element from a visual picker. The problem is that the job you actually need a Windows automation tool for in 2026 is different. You want an AI coding assistant to read the screen, pick the right element, and act on it, in a loop, without a human in the middle.
What the listicles cover
- Fixed RPA studios: Power Automate, UiPath Studio, Blue Prism
- Scripting tools: AutoHotkey, AutoIt
- Python UIA wrappers: pywinauto, uiautomation
- Screenshot or OCR agents: SikuliX, pixel-diff bots
- Test drivers: WinAppDriver, Appium Windows, Coded UI (retired)
- Task schedulers: Windows Task Scheduler, cron-style runners
Six categories, zero of them designed to be called from a typed SDK by Claude Code, Cursor, or Codex. That is the gap. Terminator fills it, and the reason it can is an implementation detail most automation coverage never touches: how the tool talks to Windows in the first place.
The anchor: one CacheRequest, seven properties, a whole subtree
Microsoft UI Automation (UIA) is the accessibility API that backs almost every modern Windows control: WPF, WinUI, UWP, WinForms, Electron apps, the Office suite, File Explorer, the Start menu. Screen readers use it. Accessibility inspectors use it. It exposes every UI element with a role (ControlType), a name, a bounding box, and dozens of other properties. If you read it well, you do not need pixels and you do not need coordinates.
The catch is that UIA is cross-process. Your automation process and the target application are different processes, and every property lookup is a COM call across that boundary. Naive wrappers do one call per property per element. A mid-sized window has a few hundred elements and you usually need six or seven properties each. That is thousands of round-trips before your agent has decided what to click.
Terminator takes the documented solution and wires it into its tree builder by default. The file is crates/terminator/src/platforms/windows/tree_builder.rs. The function is build_tree_with_cache. This is the anchor fact for this whole page.
Seven properties, one scope flag (TreeScope::Subtree is value 7: Element + Children + Descendants), one find_first_build_cache. After that, get_cached_name, get_cached_control_type, get_cached_bounding_rectangle, and the rest are all local memory reads. A counter in TreeBuildingContext named cache_hits tracks how many times the hot path took the local read.
“[CACHED_TREE] is the log line you watch. Seven properties for an entire subtree in one round-trip. Every subsequent read is free.”
crates/terminator/src/platforms/windows/tree_builder.rs
What the typical library does instead
To see why the CacheRequest matters, compare it to the default pattern most Windows automation libraries use when you access a property. Each getter is its own round-trip.
That cost scales with the size of the UI, not the complexity of the task. Once a window has 300 elements, you are spending most of your clock budget on IPC, not on logic. If an AI agent is driving that library in a loop, every iteration pays the same tax.
How a tool call flows through the system
When your AI coding assistant calls an MCP tool, here is what happens underneath. Notice where the expensive boundary is and how many times it gets crossed.
Agent -> MCP -> Rust -> UIA
Two requests cross into the Windows UIA boundary: the cache request build, and the single find_first_build_cache. Everything after that is local. For comparison: a naive wrapper would have one arrow per property per element going right, and one going back.
Terminator versus the listicle shortlist
The usual suspects all do something real. Where they fall short, specifically, is what an AI coding assistant needs: a typed SDK, a structured view of the screen, and a cheap per-iteration cost.
| Feature | Typical Windows RPA or scripting tool | Terminator |
|---|---|---|
| How it sees the screen | Pixels, OCR, or coordinates | Microsoft UI Automation accessibility tree |
| How many IPC calls per tree build | One per property per element (N times 6+) | One find_first_build_cache, then local reads |
| Selector stability across themes and DPI | Breaks on theme or font changes | Role, Name, AutomationId stay the same |
| Primary interface | Drag-and-drop studio or record-and-replay | Typed SDK (Rust, Node, Python) plus MCP server |
| Fit for AI coding assistants | No, needs a human in the studio | Yes, Claude Code and Cursor call MCP tools directly |
| Works with your current Windows session | Often boots an isolated robot session | Runs in your logged-in user session |
| License | Per-bot or per-studio seat | MIT, mediar-ai/terminator on GitHub |
| Browser scripting from the same tool call | No, separate browser product | execute_browser_script via chrome.debugger |
How an AI coding assistant reaches every Windows app
The value of a fast tree builder is that it unlocks calling pattern: agent sends intent, tool returns a tree, agent picks a locator, tool runs the action. Same pattern for every Windows surface you might want to automate.
One framework, every Windows surface
What it looks like from your side
The SDK surface stays small. You open an application, you locate elements by role and name, you act. The CacheRequest detail is in the Rust core below you; you do not have to know about it to benefit from it.
The pieces that make it work
Six things come together. The cache batch is the headline, but the rest of the system matters too.
One CacheRequest, whole subtree
build_tree_with_cache creates a single IUIAutomationCacheRequest, adds seven UIProperty values, sets TreeScope::Subtree, and fetches the element plus every descendant in one COM round-trip. Source: crates/terminator/src/platforms/windows/tree_builder.rs, line 398 onward.
Seven batched properties
ControlType, Name, BoundingRectangle, IsEnabled, IsKeyboardFocusable, HasKeyboardFocus, AutomationId. Everything an agent needs to decide what to click is pre-loaded before the locator chain runs.
TreeScope::Subtree (value 7)
Element (1) plus Children (2) plus Descendants (4) combined. One scope flag is the difference between walking the tree live and reading it from local memory.
yield_every_n_elements = 50
Default setting in TreeBuilderConfig. Every 50 elements the walker sleeps 1 ms so the target app's UI thread can service its own message pump. No freezing Excel while you index it.
get_cached_* reads
After the cache build, every descendant property is read locally with get_cached_name, get_cached_control_type, and so on. A counter in TreeBuildingContext tracks cache_hits you can log at the end of a workflow.
MCP agent, fifteen tools
terminator-mcp-agent exposes get_window_tree, click_element, activate_element, validate_element, execute_sequence, execute_browser_script, and ten more. Each tool is a typed function an AI coding assistant can call over MCP.
The step-by-step, in order
Your agent issues an MCP call
Claude Code or Cursor sends a JSON-RPC message like click_element { selector: { role: 'Button', name: 'Save' } } to the terminator-mcp-agent binary.
The Rust core builds a cached tree
build_tree_with_cache runs. One create_cache_request, seven add_property calls, TreeScope::Subtree, then find_first_build_cache. A log line starting with [CACHED_TREE] tells you how long it took.
The selector engine walks the local cache
Role, Name, and (if needed) AutomationId match against the cached subtree. No new COM calls. Name is tried first, AutomationId only when name is empty.
The action runs against the live element
Click, type, focus, or activate fire through the real COM handle. The cached tree is strictly for discovery; the live handle is used for the action itself.
The MCP response carries the result
Success, failure, or structured return value flows back through JSON-RPC to the AI assistant, which writes the next step of the workflow.
Verify it yourself
This is all public and MIT licensed. If you want to confirm the CacheRequest pattern is real before you install anything, a shallow clone and three greps are enough.
The [CACHED_TREE] log line is the timing you can trust. Run Terminator against your own Windows apps at RUST_LOG=info and you will see how long the one-shot subtree build took before a single locator ever ran.
The takeaway for choosing a Windows automation tool
If a human is driving the automation, an RPA studio is fine. If you want an AI coding assistant to drive it, you need four things the studios do not give you: a typed SDK, a structured tree instead of pixels, a cheap per-iteration cost, and an MCP server the assistant can already speak to. Terminator ships all four, and the CacheRequest-batched tree build is what keeps the cost cheap enough to loop.
Want to see it drive your own Windows workflow?
Hop on a call and we will run Terminator against the exact app your team is trying to automate.
Frequently asked questions
Why pick a developer framework over Power Automate Desktop or UiPath Studio?
Because AI coding assistants write code, not flowchart tiles. Power Automate Desktop and UiPath Studio expect a human to drag activities onto a canvas. An agent driving your desktop needs a typed SDK it can import, call, and test. Terminator is a library (Rust core with Node.js and Python bindings) plus a Model Context Protocol server, so Claude Code, Cursor, and Codex can call functions like locator('Edit', 'Address').click() the same way they already call Playwright.
What makes the accessibility tree faster than screenshots or coordinates?
Two things. First, the Windows UI Automation (UIA) API returns structured data. You get role, name, AutomationId, and bounding box as text, not pixels, so there is nothing to OCR and nothing to re-detect after a theme change. Second, Terminator fetches those properties in bulk. In crates/terminator/src/platforms/windows/tree_builder.rs the build_tree_with_cache path creates one IUIAutomationCacheRequest, calls add_property for seven properties (ControlType, Name, BoundingRectangle, IsEnabled, IsKeyboardFocusable, HasKeyboardFocus, AutomationId), sets the tree scope to Subtree, then issues a single find_first_build_cache. After that, every descendant is read locally via get_cached_* with zero further COM round-trips.
Is the UIA CacheRequest really a single round-trip?
Yes. IUIAutomation::FindFirstBuildCache is a documented Microsoft API that performs the cross-process walk once and returns a detached cached tree. Without it, each property access is its own COM call from your process to the target app, and Windows accessibility is cross-process IPC. A mid-sized window has hundreds of elements and seven properties per element; that is potentially thousands of IPC calls. The CacheRequest collapses that to one call plus local reads. Terminator logs this phase with [CACHED_TREE] in the info level so you can verify the timing yourself.
What selector strategies does Terminator support on Windows?
The three that map directly to UIA: ControlType (role), Name (accessible name), and AutomationId (the stable developer-assigned ID, when present). The engine tries name match first and only loads AutomationId when name is empty, which keeps the common path cheap. Selectors chain, so locator({ role: 'window', name: 'Notepad' }).locator({ role: 'edit' }) narrows scope at each step instead of walking the whole tree again.
How does Terminator avoid freezing the UI thread while walking the tree?
The tree builder has a yield_every_n_elements field, defaulted to 50, that inserts a 1 ms sleep after every N elements processed. That lets the target app's UI thread service its own message pump while Terminator walks the subtree. Screenshot-based agents do not hit this problem because they do not touch the UI thread, but they also do not know what they are clicking.
Does the MCP agent expose the same primitives?
Yes. The terminator-mcp-agent binary registers fifteen tools that an AI coding assistant can call directly, including get_window_tree, click_element, activate_element, validate_element, open_application, and execute_sequence. These are the same primitives the SDK exposes, so the agent and your own code share a model. You can also hand a TypeScript snippet to execute_browser_script and run it inside the active Chrome tab via chrome.debugger, which makes a single tool call span the desktop and the web.
What performance budget should I expect?
Terminator's own benchmark file (crates/terminator/src/platforms/windows_benchmarks.rs) defines thresholds for 26 scenarios covering Calculator, Notepad, Excel, Chrome, GitHub, Amazon, YouTube, and Slack. Excellent is 50 ms or less and up to 50 elements. Good is 51 to 150 ms and up to 150 elements. Fair is 151 to 300 ms and up to 300 elements. Anything over 300 ms is Poor. The throughput metric the benchmarks report is elements per millisecond.
How does this compare to AutoHotkey or pywinauto?
AutoHotkey is a scripting language that sends keyboard and mouse events. It is fast and ubiquitous, but it does not give an AI agent a structured view of the screen. Pywinauto is Python-only and also speaks UIA, but it does not use CacheRequest by default; each property access is a separate COM call. Terminator is cross-language (Rust core, Node and Python bindings), MCP-native, and wraps the cache pattern for you.
Can I verify the CacheRequest code myself?
Clone the repo and look at crates/terminator/src/platforms/windows/tree_builder.rs. The function is build_tree_with_cache. The property list sits in a [UIProperty; 7] array around line 404. The find_first_build_cache call is around line 436. Run an example workflow with RUST_LOG=info and you will see a log line starting with [CACHED_TREE] that tells you how long the build took.
Is Terminator open source and MIT licensed?
Yes. The repo is mediar-ai/terminator on GitHub. Crates, extension, MCP agent, and the Node and Python bindings are all MIT. You can read the exact COM initialization path, the selector engine, and the extension bridge without signing an enterprise contract.