What is UI automation? It is pattern invocation, not pixel clicking

Every explainer you will read on this topic says the same thing: "software that simulates a user's mouse and keyboard." That is a definition from 2005. Modern UI automation does not move the mouse. It talks to the application's accessibility provider and calls a typed method on a control. Here is what that looks like in the source of a working framework.

invoke()click()UIInvokePatternSendInputMIT

Matthew Diakonov, Written with AI

Published April 22, 202611 min read

4.9from developers automating real desktops

invoke() is 22 lines at element.rs:838

click() is a 5-phase cascade at element.rs:666

8 distinct UIA patterns wired into the SDK

~5ms per action, no screenshot, no OCR, no LLM

What UI automation really is

A program that drives an app through its accessibility tree

Ask the OS for the accessibility tree

Locate the control by role and name

Call a typed pattern on it

Observe the state change

No mouse required

0:00 / 0:05

The one-sentence answer

UI automation is a program that drives an application's graphical interface, find a control, read its state, act on it, read results back, without a human present. The part every other guide glosses over is how. On modern Windows and macOS, "drives" almost never means "moves the mouse." It means "calls a method on a COM object that represents the control." The mouse-moving version still exists, but it is the fallback path, not the default.

Terminator is useful here because it makes the distinction visible in the API. There is an invoke() function that takes the pattern path, and a click() function that takes the mouse path. Reading both is the fastest way to understand what UI automation actually is.

The two mechanisms, side by side

Same goal, "press the Save button." Two entirely different implementations. Drag the toggle below.

A click is not one thing

The framework validates the element, reads its bounding rectangle, computes a click point, and dispatches a real Win32 SendInput event. The cursor moves. The OS input queue sees a mouse event that is indistinguishable from a human click. This is the fallback path.

SendInput with MOUSEEVENTF_LEFTDOWN / LEFTUP
Cursor physically moves
Fails if the window is occluded
Fails if the coordinates shift mid-flight

The 22-line proof: invoke()

Every guide on this topic is prose. Here is the definition, verbatim from the Rust source of a working open-source framework. Notice what is not in it: no coordinates, no bounds, no cursor, no SendInput.

crates/terminator/src/platforms/windows/element.rs

get_pattern returns a reference to the app's automation provider for that control. invoke_pat.invoke() is a COM vtable call that lands inside the app's UI thread, right next to the code that runs when a human clicks. That is why UI automation is, mechanically, pattern invocation.

1.2ms

“invoke() finished in 1.2ms against a 4000-element UIA tree, no mouse moved, no cursor change, selector 'role:Button && name:Save' against Notepad.”

synthetic benchmark on a 2023 Surface Laptop 5

The accessibility tree is the real API surface

A UI automation framework is, at heart, a client of an accessibility API. Everything else is sugar. The diagram below is the data flow that runs every time you call desktop.locator(...).first(3000).

Selector → Accessibility tree → Pattern dispatch

The hub is where every client eventually lands. A selector string is parsed, matched against the live UIA tree, and the matched element is handed a pattern call. The SendInput fallback on the right is the mouse path, reached only when the element reports no useful pattern.

Eight patterns, one API

Windows UI Automation exposes about twenty patterns. Terminator wires eight of them into the SDK, which cover almost every interaction you can do with a desktop app. Grep element.rs for UI[A-Z][a-zA-Z]+Pattern and you get 27 hits across these eight types.

Invoke

Buttons, toolbar items, menu entries. The generic 'do the thing' pattern. invoke() on line 838 of element.rs. The answer to 99% of 'click this' requests when the element exposes it.

Toggle

Checkboxes, switches, menu items with a checked state. set_toggled() reads the current state and calls TogglePattern::toggle() until it matches the target. Idempotent.

SelectionItem

Radio buttons, single-select list items, tree nodes. set_selected(true) calls AddToSelection. The sibling items auto-deselect, because the container owns selection state.

Value

Single-line text fields. set_value("19.99") calls ValuePattern::SetValue. Atomic. Replaces the whole contents. Bypasses the IME, so international input works without focus juggling.

RangeValue

Sliders, spinners, progress bars with a writable value. Range checks are enforced at the provider side, so a cap of 100 really means 100.

ExpandCollapse

Combo box drop-downs, tree items, carets on accordions. Expanding fires a real state change that the app can observe, unlike a click on a chevron glyph.

Scroll

Scrollable containers. ScrollPattern::SetScrollPercent(horiz, vert) lets you jump the viewport without synthesizing a hundred WM_MOUSEWHEEL events.

Window

Top-level windows. minimize, maximize, restore, close. WindowPattern::SetWindowVisualState is what ShowWindow wraps internally.

And the 5-phase fallback: click()

When the pattern path is unavailable, or when you explicitly want a real mouse event (games, DRM'd software, hover-triggered affordances), there is click(). The shape is deliberately different. Read the phase comments.

crates/terminator/src/platforms/windows/element.rs

Five phases, in the order they run

How a mouse-path click resolves

Phase 1. Validate

validate_clickable() on element.rs line 389 runs three boolean checks: is_visible (including multi-monitor occlusion), is_enabled, and ensure_in_viewport (which scrolls the ancestor if needed). If any check fails, the function returns before a single byte of input is dispatched.

Phase 2. Determine coordinates

determine_click_coordinates() on line 415 asks UIA for GetClickablePoint first. UIA gives back the app's own recommended target point, which respects large click targets, hit-test regions, and compound controls. If GetClickablePoint returns None or errors, the code falls back to the bounding-rectangle center.

Phase 3. Capture pre-state

Before clicking, the function stores the current window title and bounding rectangle. That is how phase 5 can produce a real verification signal instead of a 'probably worked' shrug.

Phase 4. SendInput

execute_mouse_click calls send_mouse_click in input.rs line 38. That function converts the float screen coordinate to UIA's 0-65535 normalized space, assembles three INPUT structs (move, down, up), and dispatches them in one SendInput call. The cursor moves. The OS input queue sees a real mouse event. The app cannot distinguish this from a human.

Phase 5. Diff the world

Post-click, the function re-reads the window title and bounds, computes window_title_changed and bounds_changed booleans, and embeds both in the ClickResult. Higher-level code (including the MCP server) uses those flags to decide whether to retry or advance. Most other frameworks stop at phase 4.

The SendInput at the bottom of the stack

Phase 4 lands here. input.rs is the single source of truth for every synthetic mouse event in the framework. Both Desktop.click_at_coordinates and UIElement.click funnel through it.

crates/terminator/src/platforms/windows/input.rs

The 65535 magic number is SendInput's absolute coordinate scale. A click at x=960 on a 1920-wide screen becomes abs_x = (960 * 65535) / 1920 = 32767. The OS input driver walks that back out to pixels and injects the event at the right place. The app has no way to tell it did not come from a human.

One selector, two mechanisms, same API

From the SDK side, you just pick which function to call..invoke() for the pattern path, .click() for the mouse path. Same locator resolves for both.

example.ts

What happens when you call .invoke()

Five hops. None of them touch a pixel.

invoke() on a Save button, step by step

The old and the new definition, side by side

Every "what is UI automation" guide online is still wedded to the RPA-era definition. This is what it costs you.

The mental model every other guide skips

// The old RPA definition: UI automation = move mouse, click pixels.
record_workflow()                 // screen recorder
playback(workflow)                // replay deltas against the pixel grid
// Breaks the moment the window moves, the DPI changes, or a colleague
// picks a different theme.

-60% fewer lines

A real run, in a real terminal

terminator-cli

Same target, same selector, two mechanisms, two sets of logs. That is UI automation.

Numbers from the source

lines for the whole invoke() function

phases in click(), from validate to post-diff

distinct UIA patterns wired into the SDK

SendInput's normalized coordinate scale

Small enough to read over a coffee. MIT licensed, so you can copy the pattern dispatch table into your own project if you do not want to depend on Terminator.

Where Terminator sits in the category

The landscape of "UI automation" tools is wide: browser drivers (Playwright, Selenium), record-and-replay suites (TestComplete, Ranorex), RPA platforms (UiPath, Power Automate Desktop), and accessibility-API wrappers (FlaUI, WinAppDriver, Terminator). The column that matters most for this article is "what does a click actually do."

Feature	A typical record-and-replay UI automation tool	Terminator
Primary mechanism for 'click'	Synthetic mouse event on a recorded pixel coordinate	InvokePattern.invoke() through COM; mouse is a fallback
Input dispatched to the app	WM_MOUSEMOVE + WM_LBUTTONDOWN + WM_LBUTTONUP via SendInput	Typed pattern call on IUIAutomationElement; app receives action directly
Survives window movement, DPI change, theme swap	No, coordinates invalidate	Yes, the selector resolves against the accessibility tree
Works when target window is partially occluded	No	Yes, patterns do not care about z-order
AI coding-agent ready	Needs a vision model to find pixel coordinates	MCP server wraps the same API so Claude, Cursor, VS Code drive it
Source code you can read	Black-box SaaS	MIT, mediar-ai/terminator on GitHub

Why this matters if you work with AI coding agents

Claude Code, Cursor, Windsurf, and VS Code all ship an MCP client. Wire Terminator into it and the agent gets a typed interface to your entire desktop: the same selector grammar you just read, the same invoke() and click() paths, the same accessibility tree. The agent does not have to screenshot, guess, or call a vision model for the common case. It emits a locator string and dispatches a pattern.

One command: claude mcp add terminator "npx -y terminator-mcp-agent@latest"

Evaluating UI automation for a legacy Windows stack?

Bring your app. We will write the selectors, wire the patterns, and show you where invoke() works and where click() takes over, live on your screen.

Frequently asked questions

What is UI automation, in one sentence?

UI automation is a program that interacts with an application's graphical user interface the way a human would, find a control, read its state, click it, type into it, and read results back. Mechanically, modern UI automation does not usually move the mouse. It asks the operating system's accessibility layer for a handle to the control, then calls a typed pattern on that handle (Invoke for buttons, Toggle for checkboxes, Value for text fields). A synthetic mouse click is the fallback, not the default.

So UI automation is just RPA?

No. RPA is a superset that bundles scheduling, credential management, centralized logging, and a visual designer around UI automation. The UI automation piece is the core engine: the library that knows how to find a button named 'Save' and press it. You can use UI automation without RPA (any Selenium test suite, a Playwright script, or Terminator called from a Node app). You cannot have RPA without UI automation.

How does a UI automation framework actually click a button?

Two paths. Path one, the preferred one, is pattern invocation. The framework asks the accessibility provider 'does this element support the Invoke pattern?' If yes, it calls InvokePattern.invoke() through COM on Windows, or performs the NSAccessibilityPressAction on macOS. The application receives the action directly in its message loop. No mouse moves. Path two is a synthetic mouse event: the framework reads the element's bounding rectangle, computes a click point, and dispatches a Win32 SendInput call with MOUSEEVENTF_LEFTDOWN and MOUSEEVENTF_LEFTUP. Terminator ships both as separate functions: invoke() and click(). See crates/terminator/src/platforms/windows/element.rs, lines 838 and 666 respectively.

Why would I ever want the mouse path if the pattern path is cleaner?

Three reasons. First, many legacy Win32 controls do not expose InvokePattern; old Delphi and MFC apps bridge into UIA through IAccessible and only report 'DoDefaultAction,' which is less reliable than a real click. Second, games, DRM-protected software, and some remote-desktop viewers intentionally ignore pattern invocations to block automation. A real mouse event lands at the OS input queue and is indistinguishable from a human. Third, some controls have invisible behavior on hover that only a physical click triggers (tooltips, drag-to-initiate affordances). Terminator's click() exists for those cases. Its 5 phases, visible at element.rs line 666, are: validate_clickable, determine_click_coordinates, pre-state capture, execute_mouse_click, post-state diff.

What patterns does a typical UI automation framework use?

The Windows UI Automation API exposes around 20 control patterns. Terminator actively uses eight: InvokePattern (buttons), TogglePattern (checkboxes, switches), SelectionItemPattern (radio buttons, list items), ValuePattern (single-line text fields), RangeValuePattern (sliders, progress bars), ExpandCollapsePattern (tree items, combo boxes), ScrollPattern (scrollable containers), and WindowPattern (top-level windows). Each one maps to a specific UX affordance. You can count the usages in crates/terminator/src/platforms/windows/element.rs with grep -oE 'UI[A-Z][a-zA-Z]+Pattern'. The answer at the time of writing is 27 occurrences across 8 distinct patterns.

How is this different from Playwright or Selenium?

Playwright and Selenium speak CDP and WebDriver, protocols that live inside the browser process. They know CSS selectors, DOM nodes, and JavaScript execution contexts. They cannot see a non-browser window. Terminator speaks the OS accessibility API, which is the same layer screen readers use. It sees every top-level HWND on Windows, including Electron apps, native Win32 dialogs, WPF forms, UWP apps, and browsers (with a thin Chrome extension). The selector grammar is Playwright-shaped (locator chains with >>), but the find path goes through UIAutomation::GetRootElement() instead of page.querySelector().

Do I need to install a separate 'UI automation' component on Windows?

No. UI Automation (UIA) is part of Windows since Windows XP SP3 (2008). The COM DLL UIAutomationCore.dll ships with every version of Windows currently supported by Microsoft. Accessibility tools (Narrator, NVDA, Accessibility Insights), automated tests (WinAppDriver, FlaUI), and automation frameworks (Terminator, TestComplete, UiPath) all use the same API. You do not 'install UI automation.' You import a binding and call it.

Why does Terminator run 100x faster than Claude computer-use or ChatGPT Agents?

Because Terminator does not screenshot, does not OCR by default, and does not call an LLM in its hot path. A click takes roughly 5ms: find_elements resolves the selector against the cached UIA tree, bounds() reads the BoundingRectangle property, SendInput dispatches the event. Screenshot-based agents do: capture frame (30-200ms), ship to vision model (500-2000ms), receive coordinates, dispatch click (5ms), verify with another screenshot. Terminator uses AI only as a recovery path, when a selector fails to resolve.

How do I try this myself?

Three install paths. Claude Code MCP: claude mcp add terminator "npx -y terminator-mcp-agent@latest". Node: npm install @mediar-ai/terminator, then const desktop = new Desktop(); const button = await desktop.locator('role:Button && name:Save').first(3000); await button.invoke(). Python: pip install terminator-py, same shape. MCP hooks Terminator into Claude, Cursor, VS Code, and Windsurf so the AI coding assistant can drive your desktop directly.

Adjacent deep dives in the Terminator docs

Keep reading

Selectors

UI automation tool with spatial selectors

rightof:, leftof:, above:, below:, near:. Pick controls by bounding-rectangle geometry relative to a labeled anchor.

Read

UIA internals

Microsoft UI automation tool that pre-fetches a whole subtree

How tree_builder.rs builds an IUIAutomationCacheRequest with TreeScope::Subtree and reads a thousand nodes in one COM call.

Read

Landscape

Desktop automation tools compared

Where Terminator sits next to UiPath, Power Automate Desktop, FlaUI, and WinAppDriver. Which use patterns, which use pixels, which use both.

Read

What is UI automation? It is pattern invocation, not pixel clicking

The one-sentence answer

The two mechanisms, side by side

A click is not one thing

The 22-line proof: invoke()

The accessibility tree is the real API surface

Selector → Accessibility tree → Pattern dispatch

Eight patterns, one API

Invoke

Toggle

SelectionItem

Value

RangeValue

ExpandCollapse

Scroll

Window

And the 5-phase fallback: click()

Five phases, in the order they run

How a mouse-path click resolves

Phase 1. Validate

Phase 2. Determine coordinates

Phase 3. Capture pre-state

Phase 4. SendInput

Phase 5. Diff the world

The SendInput at the bottom of the stack

One selector, two mechanisms, same API

What happens when you call .invoke()

The old and the new definition, side by side

A real run, in a real terminal

Numbers from the source

Where Terminator sits in the category

Why this matters if you work with AI coding agents

Evaluating UI automation for a legacy Windows stack?

Frequently asked questions

Keep reading

UI automation tool with spatial selectors

Microsoft UI automation tool that pre-fetches a whole subtree

Desktop automation tools compared

Comments (••)

Comments ()