Automation on Windows is slow because of IPC
Every tutorial on automation on Windows teaches you how to click a button. None of them tell you why the button took two seconds to resolve. The answer is not the accessibility API, it is the number of times your process asked another process a question. This page is about the single Windows UI Automation call that took Terminator from 6.5 seconds to 200 milliseconds on a 245-element window, and why it is the reason Claude Code can drive desktop apps in real time.
The thing every automation on Windows guide gets wrong
Search the top ten results for automation on Windows and you will find a lot of Task Scheduler screenshots, a couple of AutoHotkey snippets, and a marketing page for Power Automate Desktop. Not one of them mentions the words UI Automation by name. The API is treated as a black box labeled "find the button."
Under the hood, every one of those tools is talking to the same COM interface, IUIAutomation, introduced in Windows 7 and stable across every version since. And every one of them is paying the same cost: each property you read on a UI element is a cross-process function call, because the target app lives in its own process, and your process does not share memory with it.
The naive way to walk a window tree is to recurse from the root, and for each element read its control type, name, bounds, enabled state, focus state, and a few more. That is roughly fifteen COM calls per element. On a 245-element window (your average Office dialog), that is 3,675 round trips across the process boundary before you have even found the Save button. Measured cost on a real machine: 6.5 seconds.
Where the wall-clock time actually goes
The accessibility API is fast. The IPC is not. Here is the math, out loud, for a 245-element window.
The single call that collapses the tax
UIA has a feature most tutorials skip: IUIAutomationCacheRequest. You tell UIA which properties you want ahead of time, you set the tree scope once, and you issue a single find_first_build_cache call. UIA walks the target process's tree in that process, assembles a snapshot with every property you asked for, and hands it back in one hop. From that point forward, every get_cached_* read is an in-process lookup against the snapshot. The boundary is crossed exactly once.
The cache pipeline, one IPC call
“Performance improvement: ~30-50x faster for large trees (e.g., 6.5s -> 200ms for 245 elements)”
Comment at the top of build_tree_with_cache in crates/terminator/src/platforms/windows/tree_builder.rs at line 386
The anchor: build_tree_with_cache, seven properties, one call
This is the core of Terminator's Windows backend. It lives in crates/terminator/src/platforms/windows/tree_builder.rs, starting at line 388. Every word inside this function is verifiable with grep.
The exact seven properties
Adding more properties to a CacheRequest costs nothing measurable; the single IPC call dominates. These are the seven Terminator asks for, in order.
Each one maps to a field on UIElementAttributes. The tree builder reads them through get_cached_control_type, get_cached_name, get_cached_bounding_rectangle, and friends. Nothing in that loop crosses a process boundary.
Side by side: naive versus cached
The difference between an automation on Windows that feels instant and one that freezes the agent is the top half vs the bottom half of this file.
How the IPC call actually travels
The sequence is short because we want it to be. One hop out, one hop back, the whole tree comes with it.
build_tree_with_cache on a 245-element window
Watch the walk in four frames
Inside a single build_tree_with_cache call
Frame 1: the request
Terminator builds a CacheRequest, adds the seven UIProperty values, sets TreeScope::Subtree once, and creates a true_condition that matches every element.
Verify against the source
The claims on this page are grep-verifiable. Clone the repo and run these commands. If any line returns something different, the page is wrong; file an issue.
The six functions that make the cache work
Each card is a real symbol in the Windows UI Automation crate or Terminator's tree builder. Search for it in your copy of the repo.
find_first_build_cache
The single UIA method call that does all the heavy lifting. Accepts a TreeScope, a condition, and a CacheRequest, and returns a root element with every descendant pre-fetched.
create_cache_request
Allocates the request. Cheap. You add one UIProperty at a time; the cost of the request scales with element count, not property count.
set_tree_scope(Subtree)
Scope value 7 = Element | Children | Descendants. Without this, get_cached_children returns nothing and recursion falls back to live COM calls.
get_cached_control_type
In-process read from the snapshot. Zero IPC. Every get_cached_* accessor is the same shape.
get_cached_children
Returns the pre-loaded child array. No COM traversal, no waiting on a remote process. The recursion is plain Rust iteration.
build_node_from_cached_element
The recursive function that walks the snapshot and produces the final UINode. Every field on UIElementAttributes comes from a get_cached_* read.
Three numbers that matter
COM round trip from Terminator to the target process, no matter how many elements live inside the window.
UIProperty values pre-fetched on every node in the subtree, set once at CacheRequest construction.
Observed wall-clock for a 245-element window after caching. Same tree, no cache: 6.5 seconds.
Terminator versus naive automation on Windows
| Feature | Traditional automation on Windows | Terminator |
|---|---|---|
| Reads UI Automation tree via single CacheRequest | No, per-property COM calls | Yes, 7 properties in one find_first_build_cache |
| Tree scope set once | Per-node scope on every descent | TreeScope::Subtree at request construction |
| Wall-clock for a 245-node window | ~6.5 seconds | ~200 milliseconds |
| Falls back gracefully when cache fails | Retries same slow path | Logs and calls build_ui_node_tree_configurable |
| Exposes the primitive to AI coding assistants | UI only | MCP tool get_window_tree |
| Code-first SDKs on top | Drag-and-drop canvas | Rust, TypeScript, Python, MCP |
| Open source license | Proprietary | MIT on GitHub at mediar-ai/terminator |
The five-step version of everything above
The app lives in its own process
Windows accessibility is cross-process by design. Every property read on a UI element crosses a COM boundary. This is the root cause of slow automation on Windows.
UIA ships a batching primitive called CacheRequest
You add properties, you set scope, you call find_first_build_cache once. The server walks its own tree locally and returns a serialized snapshot. Documented since Windows 7.
Terminator wraps it in build_tree_with_cache
Seven properties, TreeScope::Subtree, one find_first_build_cache. The function is at crates/terminator/src/platforms/windows/tree_builder.rs line 388.
Every read after that is in-process
get_cached_control_type, get_cached_name, get_cached_bounding_rectangle all read from the snapshot. build_node_from_cached_element walks it recursively in pure Rust.
The agent turns the tree into action
When an AI coding assistant calls the get_window_tree MCP tool, this cached path runs. If caching fails on a weird app, the engine falls back to the recursive path at engine.rs line 3978.
Want automation on Windows that finishes before the agent times out?
Book 20 minutes and we will wire Terminator's cached UIA tree into your editor on a real workflow of your choice.
Frequently asked questions
Why is automation on Windows slower than automation in a browser?
Browser automation runs inside the browser process. The DevTools Protocol reads the DOM locally; every getBoundingClientRect is an in-process call. Windows UI Automation is the opposite: every target app is a separate process, and every property read (ControlType, Name, BoundingRectangle, IsEnabled) is a COM call across a process boundary. A naive walk of a 245-element window can easily issue 3,000 of those calls, and the cost is real. Terminator measured 6.5 seconds for that shape of tree without caching.
What is the specific optimization Terminator uses?
One single UIAutomation CacheRequest with TreeScope::Subtree. The function is build_tree_with_cache in crates/terminator/src/platforms/windows/tree_builder.rs at line 388. It adds seven properties to the cache (ControlType, Name, BoundingRectangle, IsEnabled, IsKeyboardFocusable, HasKeyboardFocus, AutomationId), sets the scope to Subtree (which is Element plus Children plus Descendants, value 7), then calls find_first_build_cache once. After that, every get_cached_control_type, get_cached_name, get_cached_bounding_rectangle call is a pure in-process lookup, zero COM traffic.
How much does the cached approach actually save?
The comment at the top of build_tree_with_cache in tree_builder.rs is specific: '30-50x faster for large trees (e.g., 6.5s -> 200ms for 245 elements)'. The engine's tree builder tries the cached path first and only falls back to the recursive per-property approach if caching fails. See crates/terminator/src/platforms/windows/engine.rs at line 3966 for the fallback branch.
Why do the other automation on Windows tools not use CacheRequest?
Most consumer automation on Windows tools are not performance-bottlenecked on tree reads because they do not walk the tree. Power Automate Desktop opens a dedicated UIA connection per selector and memoizes the result; AutoHotkey mostly cares about a single control under the cursor; RPA canvases re-scan only the recorded region. Terminator is different because AI agents need the full tree to choose a target, and they need it fast enough that the agent does not time out. That pushes the IPC batching problem onto the critical path.
Which UIProperty values are in the cache request?
Exactly seven: UIProperty::ControlType, UIProperty::Name, UIProperty::BoundingRectangle, UIProperty::IsEnabled, UIProperty::IsKeyboardFocusable, UIProperty::HasKeyboardFocus, UIProperty::AutomationId. This is the minimum set the tree builder needs to produce a UINode with role, name, bounds, enabled state, focus state, and a stable element ID. Adding more properties to the cache request is cheap; the single IPC call cost scales with element count, not property count.
What is TreeScope::Subtree and why does it matter?
TreeScope is a UIA enum that controls how deep a search runs. Element is value 1 (just this node), Children is 2, Descendants is 4, and Subtree is 7 (the bitwise OR of all three). Terminator sets Subtree on the cache request, which tells UIA to pre-load every descendant of the root window. Without that, get_cached_children would return an empty iterator and every recursion would have to cross the COM boundary again.
Does the cache go stale during a long automation?
Yes, the cache is a snapshot. Once a UI mutation happens (a dialog opens, a control changes value), the cached nodes no longer reflect the live tree. Terminator's action tools rebuild the tree before and after each mutation when you pass ui_diff_before_after:true. The cost is amortized: one IPC call per action, not 15 per element per action.
What happens if the cache request fails?
The engine falls back to the recursive per-property path. See crates/terminator/src/platforms/windows/engine.rs at line 3978: on Err, the code logs 'Cached approach failed, falling back to recursive' and proceeds with build_ui_node_tree_configurable. The fallback path uses batched children reads with a configurable timeout_per_operation_ms (default 50ms) and yields the CPU every N elements to keep the host responsive.
Does this matter for short scripts or only for agent workflows?
Both. A one-shot script that reads a single edit field makes one IPC call with or without caching. But any script that needs to find a specific control (searching by role and name, iterating siblings, validating a workflow) is walking the tree. The moment you walk more than 50 elements, the difference between caching and not caching shows up as real wall-clock seconds. For agent workflows where the model inspects the tree on every turn, caching is what makes sub-second turns possible.