The UI automation tool that finds controls by geometry, not by id

Every listicle for "UI automation tool" names the same ten browser test frameworks. None of them can talk to an AutomationId-less legacy Win32 dialog. Terminator can, because the selector grammar ships five spatial operators that target controls by their bounding rectangle relative to a labeled anchor. The whole filter is 70 lines of Rust.

rightof:leftof:above:below:near:MIT
M
Matthew Diakonov
9 min read
4.9from developers automating desktop UIs
Spatial selectors: rightof:, leftof:, above:, below:, near:
const NEAR_THRESHOLD: f64 = 50.0 at engine.rs line 1815
Filter composes with &&, ||, !, >>, and parentheses
Works across Win32, WPF, UWP, WinUI 3, and browsers

What the top of the SERP keeps missing

Search "UI automation tool" and you get lists of browser test frameworks: Playwright, Selenium, Cypress, TestSprite, Applitools, Mabl. Every one of them asks the same question of the page they automate: "what is the CSS selector, the XPath, or the accessibility id for this element?" That question does not have an answer on the desktop. A 1998 Win32 dialog has no AutomationId. A WinUI 3 build regenerates the id on every compile. The label next to the field is stable, but it is not the field.

A UI automation tool that takes the desktop seriously needs a third kind of locator: one that describes position relative to a labeled anchor. Terminator ships five, and they are ordinary selectors in the grammar, not a helper module tacked on.

The five operators

Four directional filters plus one proximity filter. Every card below maps to an arm of the match selector statement in engine.rs lines 1801 to 1827.

rightof:

Requires candidate_left >= anchor_right AND vertical overlap. Keeps the match on the same row as the anchor label.

leftof:

Mirror of rightof:. candidate_right <= anchor_left, vertical overlap required. Great for edit fields that sit to the left of a unit dropdown.

above:

candidate_bottom <= anchor_top AND horizontal overlap. Targets the section header that crowns a form field.

below:

candidate_top >= anchor_bottom, horizontal overlap. The help text under an input, the caption under an image.

near:

Euclidean distance between bounding-rectangle centers, compared against const NEAR_THRESHOLD: f64 = 50.0 on engine.rs line 1815. Picks up tiny affordances glued to a labeled control.

Compose them

Each spatial selector wraps another Selector. rightof:rightof:name:Amount resolves the anchor, then steps right twice.

The anchor: engine.rs line 1815

There is no mystery here. The whole thing is a bounding-rectangle math problem. The Rust below is verbatim from the repo, with variable names shortened for the page.

crates/terminator/src/platforms/windows/engine.rs

Three things to notice. The overlap check for directional selectors is the part most hand-rolled "find the button to the right" utilities forget. The Near distance is a hardcoded 50 pixels, the same in 1080p and 4K, because the accessibility layer already reports physical pixels post-DPI scaling. And the entire thing lives behind one arm of a match in the platform engine, so the public API is just "pass a selector string".

5ms

Anchor resolved in 4ms, 312 visible candidates filtered in 1ms, one match returned.

selector 'rightof:name:Amount && role:Edit' against an open QuickBooks invoice form

The enum that makes it composable

Every spatial operator is a variant of the same Selector enum that holds Role, Name, And, Or, and Chain. That is why they compose without special syntax.

crates/terminator/src/selector.rs

Each variant wraps a Box<Selector>, so you can nest them arbitrarily deep. The parser in the same file translates the rightof: prefix at lines 419 to 437 into the RightOf variant, then returns the rest of the string to the ordinary selector parser for recursion.

What happens when you call .first()

One locator string kicks off five phases. Everything in the sequence below happens inside find_elements on the Windows engine, no extra IPC beyond what UIA already demands.

rightof:name:Amount && role:Edit, under the hood

SDKSelectorUIA treeFilterlocator("rightof:name:Amount && role:Edit")parse → And(RightOf(Name), Role)find_element(Name:"Amount")UIElement anchor, bounds (x,y,w,h)find_elements(Visible(true), depth=100)Vec<UIElement> candidatesfilter by rightof + vertical_overlapnarrowed setUIElement (edit field to the right of Amount)

Five phases, in the order they run

How find_elements resolves a spatial selector

1

Resolve the anchor

self.find_element(inner, root, timeout) runs the inner selector first. rightof:name:Amount pulls the single element whose accessible name matches 'Amount'. The anchor must be exactly one element, so if two things carry the same label, wrap it: rightof:(name:Amount && process:quickbooks).

2

Read anchor_bounds

bounds() returns a (left, top, width, height) tuple in physical pixels, straight from IUIAutomationElement::get_CachedBoundingRectangle. The four derived values anchor_right, anchor_bottom, and the anchor center are computed once and reused for every candidate in the loop.

3

Fetch every visible descendant

Selector::Visible(true) with depth=100 and a 500 ms timeout sweeps the subtree. The visible filter is applied at the accessibility layer: offscreen items, collapsed tree nodes, and hidden menu items never enter the candidate pool.

4

Filter by geometry

For RightOf/LeftOf, require vertical_overlap: candidate_top < anchor_bottom AND candidate_bottom > anchor_top. For Above/Below, require horizontal_overlap on the mirror axis. For Near, ignore overlap and test the Euclidean distance between centers against NEAR_THRESHOLD = 50.0.

5

Return matches

The filter.collect() returns a Vec<UIElement>. Higher-level selectors like Chain (>>) continue from these results. If you appended && role:Edit, the chain engine narrows to just the editable controls in the filtered set.

Write what the user sees

The typical failure mode of a UI automation tool is not a wrong click, it is a test that silently breaks between builds because the element id changed and nobody noticed. Spatial selectors move the contract to the user-facing label, which is exactly what changes when the feature changes.

Why a UI automation tool should target what humans see

// A typical UI automation tool pinned to an AutomationId
// that the WinUI build pipeline regenerates every release.
driver.FindElementByAccessibilityId("amountInput_b27f91");

// Six releases later, this test breaks. Nothing moved on screen,
// but the auto-generated id rolled. The selector is pointing at
// an element that no longer exists.
-14% fewer lines

What it looks like from the SDK

Same locator grammar in TypeScript, Python, and Rust. The string below is parsed once on the Rust side, compiled to a Selector tree, and dispatched into find_elements. The TypeScript binding is a NAPI passthrough.

example.ts

A real run, in a real terminal

terminator-cli

The field has no AutomationId at all. The field to the right of "Amount" still gets found. That is the whole point.

Numbers from the source

0
spatial operators in the grammar
0
pixel NEAR_THRESHOLD, hardcoded, line 1815
0
lines of Rust for the entire filter (engine.rs 1754 to 1836)
0
lines of selector.rs grammar above it

Small enough to read on a lunch break. MIT licensed so you can copy the pattern into your own UI automation tool if you prefer not to depend on Terminator.

Where spatial selectors fit on the landscape

"UI automation tool" is a crowded category. Most entries belong to one of four families: browser drivers, record-and-replay suites, RPA platforms, and accessibility-API wrappers. Spatial selectors exist in some form in a couple of them (Playwright's near, Appium's findByAndroidUIAutomator with fromParent), but nowhere is the grammar first-class enough that you can chain rightof: and below: with role: and name: in one string.

PlaywrightSeleniumCypressWebdriverIOAppiumTestCompleteUiPathPower Automate DesktopAutomation AnywhereFlaUIInspect.exeWinAppDriverTestSpriteApplitoolsFunctionizeMablRanorexTerminator

Terminator versus the listicle picks

FeatureTypical browser-first UI automation toolTerminator
Find controls without a stable idRequires AutomationId or XPathrightof:name:Amount && role:Edit
Geometry-aware locator grammarFlat id or name lookup5 spatial operators, composable with && || !
Works beyond the browserChromium/Firefox/WebKit onlyEvery Win32, WPF, UWP, WinUI, browser via Chrome extension
Deterministic, no ML inferenceSelf-healing heals wrong things silentlyBounding rect math; 70 lines in engine.rs
Source you can forkClosed SaaSMIT, mediar-ai/terminator on GitHub
Driven by AI coding agents directlyRecord-and-replay, out of agent loopMCP server, one-line install into Claude/Cursor

Why this matters for AI coding agents

An AI coding assistant that has Terminator wired in through MCP does not eyeball the screen. It asks the tool for the accessibility tree, emits a spatial selector against a label it can see in text, and fires an Invoke pattern. Claude Code, Cursor, Windsurf, and VS Code all support it out of the box.

One command: claude mcp add terminator "npx -y terminator-mcp-agent@latest"

Trying to automate a legacy desktop app with no AutomationIds?

Bring the app. We will sit with you and sketch the spatial selectors live, the same way we write them internally.

Frequently asked questions

What is a UI automation tool, and what makes Terminator one?

A UI automation tool is software that drives a graphical interface the way a human would: finding controls, clicking, typing, waiting for state changes, and reading results back. The oldest tools drive a browser only (Selenium, Playwright, Cypress). The newer record-and-replay ones wrap a screen recorder around heuristics (Testim, Mabl, Applitools). Terminator is a code-first UI automation tool that speaks the operating system's own accessibility tree, the same tree screen readers use. The selector grammar is Playwright-shaped and runs against every Win32, WPF, UWP, and WinUI surface, plus browsers. What sets it apart in this category is a spatial selector family, rightof:, leftof:, above:, below:, and near:, that finds controls by their bounding-rectangle geometry relative to a labeled anchor, so you do not need an AutomationId that may not exist.

Why do spatial selectors matter on the desktop?

Desktop AutomationIds are not stable. WPF assigns them if the developer bothers. WinUI generates them but they change between builds. Win32 apps from before 2006 frequently have no AutomationId at all, just a control role and a sibling relationship to a label. If your selector library only supports role, name, and id, those legacy apps force you to fall back to pixel coordinates or OCR. Spatial selectors give you a third path: locate the labeled cell (a static text that always reads 'Amount' or 'Tax'), then ask for the editable control to its right. The label is what humans use. The geometry is what the accessibility tree already exposes through BoundingRectangle. Terminator wires those two together.

How is the near: selector actually implemented?

In Rust at crates/terminator/src/platforms/windows/engine.rs, lines 1814 to 1826. It takes the anchor element's bounding rectangle center (anchor_left + anchor_width/2, anchor_top + anchor_height/2) and the candidate's center, computes the Euclidean distance sqrt(dx*dx + dy*dy), and keeps the candidate if that distance is below NEAR_THRESHOLD, a hardcoded 50.0 pixel constant on line 1815. No fuzzy matching, no ML, no heuristics, just one dot product. This is deliberately simple so the behavior is deterministic across machines and DPI settings.

What about rightof:, leftof:, above:, below:?

Directional spatial selectors have two conditions each. For rightof:, candidate_left must be greater than or equal to anchor_right (the candidate sits entirely to the right), and there must be vertical overlap: candidate_top less than anchor_bottom and candidate_bottom greater than anchor_top. The vertical overlap check is what makes rightof: useful in practice; without it, a tab header twelve rows above the anchor would qualify. Leftof: mirrors it. Above: and below: swap axes: the candidate must be entirely above or below the anchor, and there must be horizontal overlap. All four land on the same filter loop in engine.rs starting at line 1754.

How do spatial selectors compose with the rest of the grammar?

They are first-class selectors, so they chain with >>, combine with && and ||, and nest inside parentheses. A selector like window:Calculator >> rightof:name:Three && role:Button && name:Four is legal: scope to the Calculator window, find a button whose name is Four, and require that it sit to the right of the Three label. The parser in selector.rs handles this through a Shunting Yard pass that recognizes rightof: as an atomic selector before the boolean operators attack. You can also nest them: rightof:rightof:name:Amount resolves the anchor twice, so you land two columns over.

Is this the same as Playwright's near() locator?

Conceptually similar, mechanically different. Playwright's near(locator, maxDistance) operates on the browser DOM via getBoundingClientRect. It is bounded to the page. Terminator's near: operates on the Windows UI Automation tree, which spans every top-level HWND on the desktop, including other processes. You can have an anchor in one application's window and resolve candidates inside the same application's dialog. The implementation under the hood is one IUIAutomation tree query filtered by IUIAutomationElement bounding rectangles, not document.querySelectorAll.

What other selector prefixes does Terminator support?

Twelve atomic prefixes: role:, name:, text:, id:, nativeid:, classname:, nth:, visible:, process:, window:, pos:, and the spatial family rightof:, leftof:, above:, below:, near:. They combine with && (and), || (or), ! (not), >> (descendant), and parentheses. The complete grammar is in crates/terminator/src/selector.rs, 753 lines. Unsafe wildcard and regex are intentionally absent because the accessibility tree gives you case-insensitive substring matching by default.

What about macOS and Linux?

The selector enum and parser are platform-neutral and compile everywhere the core crate does. The spatial filter you see in engine.rs is Windows-only today because the macOS AX backend and the AT-SPI2 Linux backend expose a different rectangle API. The published npm and pip binaries are Windows-only. The Rust crate builds on macOS, but spatial selectors there currently fall through to the default element finder.

How do I try it?

Three install paths. For MCP agents (Claude Code, Cursor, VS Code, Windsurf): claude mcp add terminator "npx -y terminator-mcp-agent@latest". For Node: npm install @mediar-ai/terminator. For Python: pip install terminator-py. Then: const fld = desktop.locator("rightof:name:Amount && role:Edit"); await fld.first(3000).then(el => el.typeText("19.99")). The same locator string works from TypeScript, Python, Rust, and any MCP client.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.