The Microsoft UI Automation tool that pre-fetches a whole subtree in one COM call

Most UIA wrappers crawl the accessibility tree by reading one property at a time. Each read is a COM round-trip. Terminator builds an IUIAutomationCacheRequest with seven UIProperty fields, sets TreeScope::Subtree, and reads everything below with zero follow-up IPC. The whole pattern is 70 lines of MIT-licensed Rust.

IUIAutomationCacheRequestTreeScope::Subtree = 7MIT
M
Matthew Diakonov
8 min read
4.9from developers shipping desktop automation
Wraps the official IUIAutomation COM API in safe Rust
Pre-fetches seven UIProperty fields per node in one IPC call
TreeScope::Subtree (Element=1, Children=2, Descendants=4 = 7)
Open source: tree_builder.rs lines 398 to 466, MIT licensed

What the SERP keeps missing

Search "microsoft ui automation tool" and you get the same five pages: the Microsoft Learn UI Automation Overview, the Win32 entry-uiauto-win32 reference, the Wikipedia article, a TestComplete marketing page, and a TestArchitect doc. They all explain that UIA is the successor to MSAA, that it exposes control patterns, and that screen readers and RPA tools use it. They all stop there.

None of them touch the only question that matters when you actually ship something on top of UIA: how do you avoid making one COM call per property per element? Microsoft documents IUIAutomationCacheRequest in the Win32 reference because UIA is unusable at scale without it. But "use a cache request" in isolation is not a working pattern. The rest of this page is the working pattern, copied straight from an MIT-licensed Rust implementation that drives Notepad, Chrome, Excel, and Slack from the same locator string.

The anchor: tree_builder.rs lines 398 to 466

This is the actual code. It is the bottom of the call stack anytime you call desktop.getWindowTree(processName) from the Node.js or Python SDK. There is no second implementation hiding somewhere; this is it.

crates/terminator/src/platforms/windows/tree_builder.rs

One find_first_build_cache call materializes every node beneath the root with all seven properties pre-populated. Then the recursive walker reads only cached fields:

tree_builder.rs (build_node_from_cached_element)
1 IPC

Cache built in 142ms, then 3,847 elements walked in 11ms. No second IPC call.

A typical [CACHED_TREE] log line on a Notepad with one open Word and one open Edge

Naive UIA versus cached UIA, side by side

The two snippets below do roughly the same work. The left one is what every introductory UIA tutorial shows you. The right one is what tree_builder.rs actually compiles down to.

Why a Microsoft UI Automation tool lives or dies on caching

// Naive UIA wrapper — one COM round-trip per property per node
for element in walker.descendants(root) {
    let ct = element.get_control_type()?;        // IPC
    let name = element.get_name()?;              // IPC
    let rect = element.get_bounding_rectangle()?;// IPC
    let id = element.get_automation_id()?;       // IPC
    let enabled = element.is_enabled()?;         // IPC
    // 1000 nodes x 5 reads = 5000 IPC calls
}
0% fewer COM calls per node

Inputs, hub, outputs

Terminator does not invent a new accessibility surface. It is a thin, opinionated wrapper around what Windows already gives you, shaped into one locator API that downstream callers can drive in any language.

How the Windows accessibility surface flows through Terminator

IUIAutomation
Win32 EnumWindows
IAccessible bridge
Chrome extension
Terminator
Node.js SDK
Python SDK
Rust crate
MCP agent

The seven UIProperty fields it caches

Every cell here corresponds to one entry in the properties array on line 404 of tree_builder.rs. If you fork Terminator and add a new selector kind, this is the place you extend.

ControlType

Button, Edit, ComboBox, TabItem, DataGrid; mapped from string roles in utils.rs::map_generic_role_to_win_roles.

Name

The accessible name UIA exposes. Cached so you can match against name:Save without a second COM call.

BoundingRectangle

Left, top, width, height in physical pixels. Used for spatial selectors like rightof: and below:.

IsEnabled

Greyed-out controls are filtered before any pattern call so you do not waste an Invoke on a dead button.

IsKeyboardFocusable

When include_all_bounds is false, only focusable nodes get bounds, which keeps the cache lean for big apps.

HasKeyboardFocus

Lets save_focus_state and restore_focus_state preserve keyboard context across an automation step.

AutomationId

The stable id WPF and WinUI assign to controls; queried by id: and nativeid: selector prefixes.

How the cached walk is actually built

Five things tree_builder.rs does, in order

1

Initialize UIA in MTA mode

engine.rs calls UIAutomation::new_direct(), which performs CoInitializeEx with COINIT_MULTITHREADED so the same UIA instance can be queried from any tokio task. The wrapper hands back a ThreadSafeWinUIElement(Arc<UIElement>) so the locator engine and tree builder can share it without per-call locking.

2

Build a CacheRequest with seven UIProperty fields

create_cache_request returns an IUIAutomationCacheRequest. Terminator calls add_property seven times, with the exact fields the selector engine and the layout calculator need. Adding more is cheap; the trade-off is just memory in the cache.

3

Set TreeScope::Subtree

Element=1, Children=2, Descendants=4, all OR'd to 7. This is the difference between caching one node and caching the entire descendant tree. The constant is documented in a comment in the source so you do not have to memorize it.

4

Call find_first_build_cache once

This is the only round-trip you pay for the entire tree walk. The cached_root that comes back has every cached property and every cached child pre-populated. Terminator times this separately from the tree-walk so you can see exactly how much of a slow page is COM IPC versus your own logic.

5

Read with get_cached_*

build_node_from_cached_element calls get_cached_control_type, get_cached_name, get_cached_bounding_rectangle, is_cached_keyboard_focusable, and recurses with get_cached_children. None of those touch the wire. The selector engine then evaluates its boolean predicate against the cached UINode tree.

Numbers from the source

0UIProperty fields cached per node
0find_first_build_cache call per tree walk
0 loclines of cached tree builder
0 locselector.rs grammar above it

The selector engine on top of the cached tree contributes another 0 lines. Twelve atomic prefixes (role:, name:, id:, classname:, visible:, process:, and the spatial family) compose with &&, ||, and ! into a flattened predicate that runs against each cached element.

What it looks like from the SDK

You never write the cache request yourself. The SDK exposes a Playwright-style locator and a one-call tree dump. Both go through the cached builder above.

example.ts

What you actually see when you run it

terminal

A checklist for any UIA tool author

If you are writing your own Microsoft UI Automation tool, the difference between something that drives Excel in a second and something that hangs for thirty is whether all five of these are true.

Things to confirm before you ship

  • Build an IUIAutomationCacheRequest with the properties your selectors actually use, not all 100.
  • Set TreeScope::Subtree (value 7), not Element, so descendants come back in the same call.
  • Call find_first_build_cache or find_all_build_cache; plain find_first ignores your cache request.
  • Read with get_cached_*; a stray get_control_type after caching pays the IPC anyway.
  • Recurse via get_cached_children; FindFirst on a child triggers a fresh server query.

Terminator versus the closed-source RPA stack

FeatureTypical UIA-based RPA toolTerminator
Pre-fetches entire subtree in one IPC callOne COM call per property per elementfind_first_build_cache + TreeScope::Subtree
Open MIT-licensed source you can auditClosed-source RPA suitetree_builder.rs, ~70 lines, on GitHub
Selector grammar with && || ! ( )One id, one XPath, one rolerole:Button && (name:OK || name:Confirm)
Same API for Win32, WPF, UWP, WinUI 3Often only one framework familyWhatever IUIAutomation exposes, Terminator drives
Driven from TypeScript, Python, Rust, MCPSingle SDK or vendor scripting language@mediar-ai/terminator, terminator-py, terminator-rs, MCP agent
Falls back to OCR + Gemini vision via unified indexPixel matching is a separate productclick_by_index covers UIA, OCR, Gemini, and DOM

Why this matters for AI coding agents

Terminator's MCP server gives Claude Code, Cursor, Windsurf, and VS Code agents a deterministic Microsoft UI Automation tool. The agent does not eyeball a screenshot and guess; it runs the cached tree builder, evaluates a boolean selector against the returned UINode tree, and fires an IUIAutomation Invoke pattern on the chosen element. Same loop, fewer tokens, fewer hallucinations, no flake from a stale screenshot.

One install: claude mcp add terminator "npx -y terminator-mcp-agent@latest"

The wider Microsoft UI Automation ecosystem

Terminator is one entry in a long list of UIA-based tools, each with a different shape. Inspect.exe and AccEvent come from the Windows SDK and are read-only. FlaUI is the canonical .NET wrapper. Power Automate Desktop, UiPath, and Automation Anywhere are commercial RPA suites that wrap UIA inside a flow editor. Code-first wrappers in other languages cover the rest of the space.

Inspect.exeAccEventFlaUInspectUI Automation .NETFlaUIuiautomation (Python)Python-UIAutomation-for-WindowswinappdriverPower Automate DesktopTestCompleteTestArchitectTerminator

What separates them is the shape of the API on top: a flow editor, a C# library, a Python module, a CLI, an MCP server. What unites them is the COM surface underneath. The faster they handle that surface, the less wall-clock time your automation spends waiting on IPC.

Driving Microsoft UI Automation at scale?

Talk to the team. We will walk through the cache builder, the selector grammar, and how to wire it into your stack.

Frequently asked questions

What is Microsoft UI Automation, and what is a Microsoft UI Automation tool?

Microsoft UI Automation (UIA) is the COM-based accessibility framework that ships with Windows, exposed through interfaces like IUIAutomation, IUIAutomationElement, IUIAutomationCacheRequest, and a set of control patterns (Invoke, Value, Toggle, ExpandCollapse, and so on). A Microsoft UI Automation tool is anything that drives that API to inspect or control desktop UI: Inspect.exe and AccEvent from the Windows SDK for inspection, FlaUInspect for trees, and code-level wrappers like UIA in .NET, the C++ headers, the Python uiautomation package, FlaUI for C#, and Terminator for Rust, TypeScript, and Python. Terminator is unusual in that it pre-fetches an entire subtree of properties in one COM call and gives you a Playwright-shaped locator API across every Win32, WPF, UWP, and WinUI surface.

Why is per-property COM access the bottleneck in most Microsoft UI Automation tools?

Every IUIAutomationElement property read crosses a COM apartment boundary and, in the cross-process case, marshals through the OS. A single Name access can take milliseconds. If you walk a 1000-element tree and read seven properties per element, that is 7000 IPC calls. The official IUIAutomation6 docs document IUIAutomationCacheRequest precisely because Microsoft anticipates this. Terminator's tree_builder.rs uses a CacheRequest configured with TreeScope::Subtree so the entire descendant tree is materialized in one find_first_build_cache call. Subsequent reads call get_cached_control_type, get_cached_name, get_cached_bounding_rectangle, all of which read from the local cache without any IPC.

What exactly does TreeScope::Subtree mean?

TreeScope is a bitmask defined by IUIAutomation. Element = 1, Children = 2, Descendants = 4. Subtree is the OR of all three, value 7. When you pass TreeScope::Subtree to set_tree_scope on the cache request, the UIA core fills the cache for the root element and every node beneath it in one server roundtrip. Terminator literally documents the constants in a comment on line 424 of tree_builder.rs so you can audit it. After the call, get_cached_children works without contacting the UIA provider again.

Which UIProperty values does Terminator pre-fetch by default?

Seven, listed verbatim in the properties array starting at line 404 of tree_builder.rs: UIProperty::ControlType, UIProperty::Name, UIProperty::BoundingRectangle, UIProperty::IsEnabled, UIProperty::IsKeyboardFocusable, UIProperty::HasKeyboardFocus, and UIProperty::AutomationId. Those are the fields the selector engine and the layout calculator actually read for every node. If you need more, string_to_ui_property in utils.rs maps another 25 names (ClassName, HelpText, ValueValue, IsPassword, IsOffscreen, the LegacyIAccessible* family) to UIProperty variants you can wire into the cache request.

How does Terminator handle Win32 controls that pre-date UIA?

Two layers. First, EnumWindows from user32.dll lists every top-level HWND, then GetWindowThreadProcessId maps it to a PID before any UIA call happens. This is in applications.rs and it is 10x to 100x faster than walking the desktop UIA tree to discover windows. Second, for the controls themselves, Terminator falls through the IAccessible bridge that Microsoft provides for legacy Win32 and MFC apps, then degrades to keyboard or pointer simulation only when no pattern is available. ShowWindow with SW_MAXIMIZE is the documented fallback when WindowPattern is missing, and you can see it on lines 959 to 979 of element.rs.

Does this work outside Windows?

The platform layer is split. crates/terminator/src/platforms/windows uses IUIAutomation. crates/terminator/src/platforms/macos uses the Accessibility API. The selector engine, the cache strategy, and the locator API are platform-neutral. The same role:Button && name:Save selector compiles on both. Linux through AT-SPI2 is in the core crate but the npm and pip binaries currently ship Windows only.

How is this different from Inspect.exe or FlaUI?

Inspect.exe is a viewer. It shows you the live UIA tree but you do not script it. FlaUI is a great C# wrapper but it is a library you embed in C# tests. Terminator is a multi-language framework: a Rust core, a NAPI Node.js binding, a PyO3 Python binding, an MCP server for AI agents like Claude Code and Cursor, and a deterministic workflow SDK. The cache-first tree builder, the boolean selector grammar, and the unified vision/OCR/DOM index system are the same across all of them.

Where can I read the source?

It is MIT licensed at github.com/mediar-ai/terminator. The cached tree builder is crates/terminator/src/platforms/windows/tree_builder.rs lines 388 to 469. The control type and UIProperty mapping is crates/terminator/src/platforms/windows/utils.rs lines 115 to 210. The selector parser is crates/terminator/src/selector.rs, 753 lines. The pattern dispatch (Invoke, Value, Toggle, ExpandCollapse, Window, Selection, Scroll, RangeValue) lives across crates/terminator/src/platforms/windows/element.rs.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.