Windows software automation, written as a query language

Record-and-replay macros pin you to the exact layout of the machine they were recorded on. Coordinate clicks break on DPI changes. Most Windows software automation tools leave you with one of the two. The alternative is a selector grammar you write by hand: window:Login >> role:Edit && rightof:(name:Username). Terminator ships one.

M
Matthew Diakonov
10 min read
4.9from dozens of design partners
&&, ||, ! parsed with Shunting Yard at selector.rs line 216
rightof: and near: filter against UIA bounds, not screen pixels
NEAR_THRESHOLD = 50.0 pixels at engine.rs line 1815

Windows automation has two usual modes. Both are bad.

Open any guide to Windows software automation on the web and you will meet the same two options. The first is a macro recorder: you press record, click through the task once, and the tool plays the clicks back. The recording is fragile because every click is anchored to a window position, a control index, or a screen pixel. The first thing that shifts breaks the recording: a Windows update, a new DPI, a theme change, a colleague opening a second monitor.

The second is scripting with coordinates. AutoHotkey, AutoIt, PyAutoGUI, and a dozen older tools let you write a .ahk or .py file that calls Click 620, 485. You get source control back but you buy it with hard-coded numbers that encode the test machine's layout.

There is a third option. Treat the desktop as a query surface. Every Windows app with an accessibility story, which is most of them, publishes a live tree of its controls to the UI Automation COM API: names, roles, AutomationIds, bounding boxes. The tree already describes the element you want to click. You just need a language to ask for it.

The selector grammar, at a glance

role:Buttonname:Save&&||!>>..rightof:leftof:above:below:near:has:nth:0visible:trueprocess:chromewindow:Calculatorclassname:Editid:submitnativeid:42text:Open

The selector enum: one type, every way to pick an element

Before the parser runs, there is the target data structure it has to produce. Terminator's Selector enum is the entire surface of the language. Every string the user writes compiles to one of these variants or a nested combination of them.

crates/terminator/src/selector.rs

How a selector becomes an element

Left column: the kinds of strings a workflow author writes. Middle: the selector engine, compiled once, reused for every query. Right column: what the engine does with the AST to produce real elements.

Terminator selector engine, inputs to outputs

role:Button && name:Save
window:X >> role:Y && rightof:(name:Z)
role:MenuItem && !name:Recent
role:ListItem && has:(name:Unread)
parse_boolean_expression()
UI Automation tree walk
Bounding-box geometry filter
Has() descendant scan
UIElement[] back to caller

Parsing: Shunting Yard, not regex

The parser in crates/terminator/src/selector.rs tokenizes the input, assigns each operator a precedence (Or = 1, And = 2, Not = 3), and uses Shunting Yard to produce a parse tree. Parentheses are first-class, so role:Button && (name:Save || name:OK) resolves its subgroup before the outer AND.

The tokenizer has one quirk worth calling out. Inside a text: selector, parentheses and commas are treated as literal content, because visible text on screen frequently contains them. The comment at line 103 uses a real Reddit-style label as the example: text:RPA Hospital (MGP)? : r/foo. No other selector prefix gets this treatment.

crates/terminator/src/selector.rs
50.0 px

const NEAR_THRESHOLD: f64 = 50.0; // the one pixel constant in the spatial filter

crates/terminator/src/platforms/windows/engine.rs line 1815

The anchor fact: NEAR_THRESHOLD = 50.0

Every automation author eventually wants to say "click the textbox to the right of the Username label." Competitors do this with pixel offsets against a template image, or with point-and-click designers that produce anchor rules nobody can read. Terminator turns it into a selector: role:Edit && rightof:(name:Username). The parser produces And(vec![Role{Edit}, RightOf(Box::new(Name("Username")))]). The engine does the geometry.

All five positional selectors live in one match arm of find_elements. The anchor is resolved first and its bounds are read from UI Automation, not from screen pixels. Then every visible element becomes a candidate, the anchor is filtered out by id, and the remaining candidates are matched against the anchor's bounding box with vertical or horizontal overlap checks. The near: selector uses one constant: const NEAR_THRESHOLD: f64 = 50.0, Euclidean distance between element centers, at line 1815.

crates/terminator/src/platforms/windows/engine.rs
0.0 pxNEAR_THRESHOLD at engine.rs line 1815
0spatial selectors (rightof, leftof, above, below, near)
0operator precedence levels (Or=1, And=2, Not=3)
0distinct Selector enum variants in selector.rs

From string to element

1

Selector string

role:Button && rightof:(name:Save)

2

Tokenizer

emits Token::Selector, And, Or, Not, (, )

3

Shunting Yard

builds Selector AST

4

Engine: find_elements

matches UIA tree + geometry

5

UIElement[]

returned to script or MCP call

What you actually write

Python, because it is the shortest way to read the grammar. The same strings work unchanged in the Node SDK, the Rust SDK, and any MCP client (Claude Code, Cursor, Windsurf).

windows_automation_snippets.py

AutoHotkey vs a selector

The canonical Save dialog click, as traditional Windows software automation writes it, and as a single Terminator selector.

Click the Save button in a Save As dialog

; AutoHotkey v2, classic Windows software automation
; Find "Save" button in the Notepad save dialog.

CoordMode "Mouse", "Window"

if WinWait("Save As", , 5) {
    WinActivate "Save As"

    ; Option A: brittle pixel coordinates
    Click 620, 485

    ; Option B: ControlClick by ClassNN, not portable across Windows
    ; builds since the ClassNN index can shift.
    ControlClick "Button1", "Save As"

    ; Option C: loop through controls, grep ClassNN, pick one.
    ; You own the search.
    for i in 1..20 {
        ctrl := "Button" i
        if ControlGetText(ctrl, "Save As") = "Save" {
            ControlClick ctrl, "Save As"
            break
        }
    }
}
56% fewer lines

Every prefix in the grammar

Each tile below is a single token the parser recognizes. Combine them with &&, ||, !, and >> to form any query the Windows accessibility tree can answer.

role:

Matches by UI Automation ControlType. role:Button, role:Edit, role:MenuItem, role:TabItem, role:ToggleSwitch. Role strings follow the UIA canonical names.

name:

Accessible name (the label a screen reader would read). Case-insensitive substring by default. name:Save matches Save, Save As..., Save Now.

text:

Visible text content, case-sensitive, substring. The tokenizer treats ( ) , as literal characters inside text: so selectors survive awkward UI labels like text:RPA Hospital (MGP)? : r/foo.

id: and nativeid:

Accessibility ID and OS-level AutomationId. id: is the cross-platform name, nativeid: is the Windows-only exact AutomationId. Use when the name changes across locales.

process: and window:

Scope selectors. process:chrome limits the search to a specific process. window:Calculator scopes to one top-level window. Pair with >> to cascade.

rightof: / leftof: / above: / below: / near:

Spatial filters evaluated against the anchor's bounding box. Require vertical or horizontal overlap; near: uses a 50.0-pixel Euclidean threshold (engine.rs line 1815).

has: and ..

has:(inner) returns containers whose descendants match the inner selector (Playwright :has()). The .. selector navigates to a parent element.

nth:, visible:, classname:

Match the N-th element (zero-indexed), filter by on-screen visibility, or match by UIA class name. Useful when the tree has many same-role siblings.

Coordinate script vs selector query

Same intent, two mental models. Flip the toggle to see what each approach actually commits to memory, to disk, and to your teammates.

You encode the layout of one specific machine into your script. Every value is a pixel, a ClassNN suffix, or an opaque recorded blob. Changes to DPI, theme, locale, Windows version, or even window size can break any of them, and you debug by re-recording.

  • Click X,Y hard-codes DPI and screen size
  • ClassNN indices shift on new Windows builds
  • Recorded UIA blobs are not human-editable
  • No boolean logic: one path per element

Five steps from selector string to clicked element

1

Tokenize the string

selector.rs line 94. The tokenizer emits Token::Selector for anything that is not an operator, plus Token::And (&&), Token::Or (|| and comma), Token::Not (!), Token::LParen, Token::RParen. text: selectors escape parentheses.

2

Parse with Shunting Yard

selector.rs line 216. parse_boolean_expression pops operators by precedence (Or=1, And=2, Not=3) and nests sub-selectors inside Selector::And, Selector::Or, Selector::Not. Descendant >> is handled separately and builds a Selector::Chain.

3

Resolve atomic selectors

Each leaf token becomes a concrete Selector variant. role:Button becomes Selector::Role{role:"Button", name:None}. rightof:(name:Username) recursively parses the inside and wraps it in Selector::RightOf(Box::new(...)).

4

Walk the UIA tree

engine.rs find_elements dispatches on the selector variant. Role/Name/Id/ClassName walk the cached UI Automation tree. Process and Window scope the root. And/Or/Not intersect, union, and exclude match sets.

5

Run the geometry filter

For RightOf, LeftOf, Above, Below, Near, the anchor is resolved first. All visible candidates are collected, the anchor is excluded by id, and the remaining candidates are filtered by bounding-box overlap. Near uses the 50.0-pixel Euclidean threshold.

Feature by feature

FeatureTypical Windows automation toolTerminator
Selector as a string you can commit to gitRecorded UIA blobs in a proprietary repositoryrole:Button && name:Save && rightof:(name:Username)
Boolean operators on element predicatesNot supported; one selector per element&&, ||, ! with explicit precedence (Or=1, And=2, Not=3)
Spatial targeting without coordinatesAbsolute X,Y click, or click-by-coordinate-offsetrightof:, leftof:, above:, below:, near: via UIA bounds
Descendant chainNested UI Spy paths or flat WinTitle match>> operator, parses into Selector::Chain(Vec<Selector>)
ParserString templatesTokenizer + Shunting Yard (selector.rs line 216)
Grammar documented and testableClosed formatselector_tests.rs with dozens of parse cases
Works across apps in one expressionPer-application configsprocess:, window:, and classname: compose freely
LicenseProprietary or EULA-lockedMIT, github.com/mediar-ai/terminator

Verify every anchor fact

Every file name, line number, and constant on this page comes from the MIT-licensed repo. Clone it, grep, read.

zsh
0px

The near: threshold, hardcoded as const NEAR_THRESHOLD: f64 = 50.0 at engine.rs line 1815.

0

Variants in the Selector enum. Covers roles, ids, text, spatial anchors, boolean combinators, chains, and tree navigation.

0

Operator precedence levels. Or = 1, And = 2, Not = 3. Set at operator_precedence() in selector.rs.

Want Windows software automation that survives a DPI change?

Bring a workflow on your machine. We will rewrite its clicks as Terminator selectors in 20 minutes, on your actual apps.

Frequently asked questions

What makes this a selector language and not just a string matcher?

A selector language has a grammar, a tokenizer, an operator precedence table, and a parse tree. Terminator has all four. The tokenizer in crates/terminator/src/selector.rs emits Token::Selector, Token::And, Token::Or, Token::Not, Token::LParen, Token::RParen. The parser uses the Shunting Yard algorithm (parse_boolean_expression at line 216) with a precedence of 1 for Or, 2 for And, 3 for Not, so role:Button && !name:Cancel || name:Back parses as (Button AND NOT Cancel) OR Back. The output is a Selector enum with variants Role, Id, Name, Text, Chain, And(Vec), Or(Vec), Not(Box), RightOf(Box), LeftOf(Box), Above(Box), Below(Box), Near(Box), Has(Box), Parent, Nth, Visible, Process, ClassName, LocalizedRole, and more. This is a compiler front end, not a regex.

What is the anchor fact in the spatial filter and where does the number live?

The near: selector fires when the Euclidean distance between the anchor's center and the candidate's center is strictly less than 50.0 pixels. That constant is defined on a single line: const NEAR_THRESHOLD: f64 = 50.0 in crates/terminator/src/platforms/windows/engine.rs, at line 1815 inside the Selector::Near arm of find_elements. The rightof: and leftof: filters require vertical bounding-box overlap (candidate_top < anchor_bottom && candidate_bottom > anchor_top) and a horizontal gap (candidate_left >= anchor_right). Above: and below: mirror that logic horizontally. All four read bounds from UI Automation, not from screen pixels, so they survive DPI changes.

How do chained selectors like window:Notepad >> role:Edit work?

The >> operator splits a selector into a Selector::Chain(Vec<Selector>) at parse time. During execution, each part is resolved against the result of the previous part as its root. So window:Notepad >> role:Edit first finds the Notepad top-level window, then searches within that window for an Edit control. Chains are parsed before boolean operators, which means window:Notepad >> (role:Button && name:OK) works and the boolean part applies only within the Notepad scope.

What about escaping commas and parentheses in names?

The tokenizer has special handling for text: selectors. When the current token starts with text:, both parentheses and commas are treated as literal characters, because visible text on screen frequently contains them. See the in_text_selector guard at selector.rs line 103. The comment explicitly cites a Reddit-style selector: text:RPA Hospital (MGP)? : r/foo. For every other selector prefix, ( and ) are parser delimiters and , means OR.

Why use the accessibility tree instead of pixel matching or a vision model?

The accessibility tree is already a structured representation of what is on the screen, with names, roles, IDs, and bounds, maintained by every Windows application that implements UI Automation (which is most of them). A pixel matcher breaks on DPI changes, theme changes, font smoothing. A vision model breaks on latency, cost, and hallucinations. Terminator does support OCR and pixel fallbacks for apps that expose nothing to UIA, but the selector language targets the tree first. You read the Windows UI Automation tree with Accessibility Insights or Inspect.exe, write the selector that points at the element, and the same selector works on your coworker's machine, on a CI runner, and in a Windows Sandbox.

Can I combine spatial and logical selectors?

Yes. role:Edit && rightof:(name:Username) finds an edit field that is both of role Edit AND to the right of an element named Username. The AND branches are flattened during parse (apply_operator at selector.rs line 283 merges nested Selector::And), so any number of predicates can compose. role:Edit && rightof:(name:Username) && !visible:false && process:chrome is a single conjunction.

How does this compare to AutoHotkey, AutoIt, Power Automate Desktop, and UiPath for targeting elements?

AutoHotkey v2 uses WinTitle syntax and ControlClick/ControlGet, which pin to window titles, class names, or ahk_id handles. AutoIt has AutoItX with similar primitives. Power Automate Desktop records clicks into opaque UIA selectors stored in its repository, which are visual-designer-only and not copy-pasteable across projects. UiPath has Full Selectors, Fuzzy Selectors, and Anchor Base activities, which are spatial but drag-and-drop only. Terminator's selectors are a string grammar you can type, commit to git, diff across versions, and chain as data. They also compile at runtime, so a running MCP agent can build them from user speech without a code change.

What if two selectors match the same element more than once?

The engine deduplicates by element id. In the positional filter at engine.rs line 1774, the anchor is explicitly skipped (if candidate.id() == anchor_id { return false }), so rightof:(name:Username) does not return the Username label itself. For non-spatial queries, Nth(N) picks the N-th match (role:Button,nth:0 is the first button). Or(Vec) returns all matches of any inner selector, deduplicated at collection time.

Is the grammar stable enough to build workflows on top of?

The parser and the Selector enum live in the core terminator crate that the Rust, Node, Python, and MCP bindings all depend on. It is MIT licensed. The boolean operators && || ! and the positional prefixes rightof: leftof: above: below: near: have been stable for over a year. Nth, Has, Parent, And, Or, and Not were added incrementally and remain backward compatible. The test file selector_tests.rs has dozens of cases covering the parser, including the legacy role|name pipe syntax which still works.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.