Windows software automation, written as a query language
Record-and-replay macros pin you to the exact layout of the machine they were recorded on. Coordinate clicks break on DPI changes. Most Windows software automation tools leave you with one of the two. The alternative is a selector grammar you write by hand: window:Login >> role:Edit && rightof:(name:Username). Terminator ships one.
Windows automation has two usual modes. Both are bad.
Open any guide to Windows software automation on the web and you will meet the same two options. The first is a macro recorder: you press record, click through the task once, and the tool plays the clicks back. The recording is fragile because every click is anchored to a window position, a control index, or a screen pixel. The first thing that shifts breaks the recording: a Windows update, a new DPI, a theme change, a colleague opening a second monitor.
The second is scripting with coordinates. AutoHotkey, AutoIt, PyAutoGUI, and a dozen older tools let you write a .ahk or .py file that calls Click 620, 485. You get source control back but you buy it with hard-coded numbers that encode the test machine's layout.
There is a third option. Treat the desktop as a query surface. Every Windows app with an accessibility story, which is most of them, publishes a live tree of its controls to the UI Automation COM API: names, roles, AutomationIds, bounding boxes. The tree already describes the element you want to click. You just need a language to ask for it.
The selector grammar, at a glance
The selector enum: one type, every way to pick an element
Before the parser runs, there is the target data structure it has to produce. Terminator's Selector enum is the entire surface of the language. Every string the user writes compiles to one of these variants or a nested combination of them.
How a selector becomes an element
Left column: the kinds of strings a workflow author writes. Middle: the selector engine, compiled once, reused for every query. Right column: what the engine does with the AST to produce real elements.
Terminator selector engine, inputs to outputs
Parsing: Shunting Yard, not regex
The parser in crates/terminator/src/selector.rs tokenizes the input, assigns each operator a precedence (Or = 1, And = 2, Not = 3), and uses Shunting Yard to produce a parse tree. Parentheses are first-class, so role:Button && (name:Save || name:OK) resolves its subgroup before the outer AND.
The tokenizer has one quirk worth calling out. Inside a text: selector, parentheses and commas are treated as literal content, because visible text on screen frequently contains them. The comment at line 103 uses a real Reddit-style label as the example: text:RPA Hospital (MGP)? : r/foo. No other selector prefix gets this treatment.
“const NEAR_THRESHOLD: f64 = 50.0; // the one pixel constant in the spatial filter”
crates/terminator/src/platforms/windows/engine.rs line 1815
The anchor fact: NEAR_THRESHOLD = 50.0
Every automation author eventually wants to say "click the textbox to the right of the Username label." Competitors do this with pixel offsets against a template image, or with point-and-click designers that produce anchor rules nobody can read. Terminator turns it into a selector: role:Edit && rightof:(name:Username). The parser produces And(vec![Role{Edit}, RightOf(Box::new(Name("Username")))]). The engine does the geometry.
All five positional selectors live in one match arm of find_elements. The anchor is resolved first and its bounds are read from UI Automation, not from screen pixels. Then every visible element becomes a candidate, the anchor is filtered out by id, and the remaining candidates are matched against the anchor's bounding box with vertical or horizontal overlap checks. The near: selector uses one constant: const NEAR_THRESHOLD: f64 = 50.0, Euclidean distance between element centers, at line 1815.
From string to element
Selector string
role:Button && rightof:(name:Save)
Tokenizer
emits Token::Selector, And, Or, Not, (, )
Shunting Yard
builds Selector AST
Engine: find_elements
matches UIA tree + geometry
UIElement[]
returned to script or MCP call
What you actually write
Python, because it is the shortest way to read the grammar. The same strings work unchanged in the Node SDK, the Rust SDK, and any MCP client (Claude Code, Cursor, Windsurf).
AutoHotkey vs a selector
The canonical Save dialog click, as traditional Windows software automation writes it, and as a single Terminator selector.
Click the Save button in a Save As dialog
; AutoHotkey v2, classic Windows software automation
; Find "Save" button in the Notepad save dialog.
CoordMode "Mouse", "Window"
if WinWait("Save As", , 5) {
WinActivate "Save As"
; Option A: brittle pixel coordinates
Click 620, 485
; Option B: ControlClick by ClassNN, not portable across Windows
; builds since the ClassNN index can shift.
ControlClick "Button1", "Save As"
; Option C: loop through controls, grep ClassNN, pick one.
; You own the search.
for i in 1..20 {
ctrl := "Button" i
if ControlGetText(ctrl, "Save As") = "Save" {
ControlClick ctrl, "Save As"
break
}
}
}Every prefix in the grammar
Each tile below is a single token the parser recognizes. Combine them with &&, ||, !, and >> to form any query the Windows accessibility tree can answer.
role:
Matches by UI Automation ControlType. role:Button, role:Edit, role:MenuItem, role:TabItem, role:ToggleSwitch. Role strings follow the UIA canonical names.
name:
Accessible name (the label a screen reader would read). Case-insensitive substring by default. name:Save matches Save, Save As..., Save Now.
text:
Visible text content, case-sensitive, substring. The tokenizer treats ( ) , as literal characters inside text: so selectors survive awkward UI labels like text:RPA Hospital (MGP)? : r/foo.
id: and nativeid:
Accessibility ID and OS-level AutomationId. id: is the cross-platform name, nativeid: is the Windows-only exact AutomationId. Use when the name changes across locales.
process: and window:
Scope selectors. process:chrome limits the search to a specific process. window:Calculator scopes to one top-level window. Pair with >> to cascade.
rightof: / leftof: / above: / below: / near:
Spatial filters evaluated against the anchor's bounding box. Require vertical or horizontal overlap; near: uses a 50.0-pixel Euclidean threshold (engine.rs line 1815).
has: and ..
has:(inner) returns containers whose descendants match the inner selector (Playwright :has()). The .. selector navigates to a parent element.
nth:, visible:, classname:
Match the N-th element (zero-indexed), filter by on-screen visibility, or match by UIA class name. Useful when the tree has many same-role siblings.
Coordinate script vs selector query
Same intent, two mental models. Flip the toggle to see what each approach actually commits to memory, to disk, and to your teammates.
You encode the layout of one specific machine into your script. Every value is a pixel, a ClassNN suffix, or an opaque recorded blob. Changes to DPI, theme, locale, Windows version, or even window size can break any of them, and you debug by re-recording.
- Click X,Y hard-codes DPI and screen size
- ClassNN indices shift on new Windows builds
- Recorded UIA blobs are not human-editable
- No boolean logic: one path per element
Five steps from selector string to clicked element
Tokenize the string
selector.rs line 94. The tokenizer emits Token::Selector for anything that is not an operator, plus Token::And (&&), Token::Or (|| and comma), Token::Not (!), Token::LParen, Token::RParen. text: selectors escape parentheses.
Parse with Shunting Yard
selector.rs line 216. parse_boolean_expression pops operators by precedence (Or=1, And=2, Not=3) and nests sub-selectors inside Selector::And, Selector::Or, Selector::Not. Descendant >> is handled separately and builds a Selector::Chain.
Resolve atomic selectors
Each leaf token becomes a concrete Selector variant. role:Button becomes Selector::Role{role:"Button", name:None}. rightof:(name:Username) recursively parses the inside and wraps it in Selector::RightOf(Box::new(...)).
Walk the UIA tree
engine.rs find_elements dispatches on the selector variant. Role/Name/Id/ClassName walk the cached UI Automation tree. Process and Window scope the root. And/Or/Not intersect, union, and exclude match sets.
Run the geometry filter
For RightOf, LeftOf, Above, Below, Near, the anchor is resolved first. All visible candidates are collected, the anchor is excluded by id, and the remaining candidates are filtered by bounding-box overlap. Near uses the 50.0-pixel Euclidean threshold.
Feature by feature
| Feature | Typical Windows automation tool | Terminator |
|---|---|---|
| Selector as a string you can commit to git | Recorded UIA blobs in a proprietary repository | role:Button && name:Save && rightof:(name:Username) |
| Boolean operators on element predicates | Not supported; one selector per element | &&, ||, ! with explicit precedence (Or=1, And=2, Not=3) |
| Spatial targeting without coordinates | Absolute X,Y click, or click-by-coordinate-offset | rightof:, leftof:, above:, below:, near: via UIA bounds |
| Descendant chain | Nested UI Spy paths or flat WinTitle match | >> operator, parses into Selector::Chain(Vec<Selector>) |
| Parser | String templates | Tokenizer + Shunting Yard (selector.rs line 216) |
| Grammar documented and testable | Closed format | selector_tests.rs with dozens of parse cases |
| Works across apps in one expression | Per-application configs | process:, window:, and classname: compose freely |
| License | Proprietary or EULA-locked | MIT, github.com/mediar-ai/terminator |
Verify every anchor fact
Every file name, line number, and constant on this page comes from the MIT-licensed repo. Clone it, grep, read.
The near: threshold, hardcoded as const NEAR_THRESHOLD: f64 = 50.0 at engine.rs line 1815.
Variants in the Selector enum. Covers roles, ids, text, spatial anchors, boolean combinators, chains, and tree navigation.
Operator precedence levels. Or = 1, And = 2, Not = 3. Set at operator_precedence() in selector.rs.
Want Windows software automation that survives a DPI change?
Bring a workflow on your machine. We will rewrite its clicks as Terminator selectors in 20 minutes, on your actual apps.
Frequently asked questions
What makes this a selector language and not just a string matcher?
A selector language has a grammar, a tokenizer, an operator precedence table, and a parse tree. Terminator has all four. The tokenizer in crates/terminator/src/selector.rs emits Token::Selector, Token::And, Token::Or, Token::Not, Token::LParen, Token::RParen. The parser uses the Shunting Yard algorithm (parse_boolean_expression at line 216) with a precedence of 1 for Or, 2 for And, 3 for Not, so role:Button && !name:Cancel || name:Back parses as (Button AND NOT Cancel) OR Back. The output is a Selector enum with variants Role, Id, Name, Text, Chain, And(Vec), Or(Vec), Not(Box), RightOf(Box), LeftOf(Box), Above(Box), Below(Box), Near(Box), Has(Box), Parent, Nth, Visible, Process, ClassName, LocalizedRole, and more. This is a compiler front end, not a regex.
What is the anchor fact in the spatial filter and where does the number live?
The near: selector fires when the Euclidean distance between the anchor's center and the candidate's center is strictly less than 50.0 pixels. That constant is defined on a single line: const NEAR_THRESHOLD: f64 = 50.0 in crates/terminator/src/platforms/windows/engine.rs, at line 1815 inside the Selector::Near arm of find_elements. The rightof: and leftof: filters require vertical bounding-box overlap (candidate_top < anchor_bottom && candidate_bottom > anchor_top) and a horizontal gap (candidate_left >= anchor_right). Above: and below: mirror that logic horizontally. All four read bounds from UI Automation, not from screen pixels, so they survive DPI changes.
How do chained selectors like window:Notepad >> role:Edit work?
The >> operator splits a selector into a Selector::Chain(Vec<Selector>) at parse time. During execution, each part is resolved against the result of the previous part as its root. So window:Notepad >> role:Edit first finds the Notepad top-level window, then searches within that window for an Edit control. Chains are parsed before boolean operators, which means window:Notepad >> (role:Button && name:OK) works and the boolean part applies only within the Notepad scope.
What about escaping commas and parentheses in names?
The tokenizer has special handling for text: selectors. When the current token starts with text:, both parentheses and commas are treated as literal characters, because visible text on screen frequently contains them. See the in_text_selector guard at selector.rs line 103. The comment explicitly cites a Reddit-style selector: text:RPA Hospital (MGP)? : r/foo. For every other selector prefix, ( and ) are parser delimiters and , means OR.
Why use the accessibility tree instead of pixel matching or a vision model?
The accessibility tree is already a structured representation of what is on the screen, with names, roles, IDs, and bounds, maintained by every Windows application that implements UI Automation (which is most of them). A pixel matcher breaks on DPI changes, theme changes, font smoothing. A vision model breaks on latency, cost, and hallucinations. Terminator does support OCR and pixel fallbacks for apps that expose nothing to UIA, but the selector language targets the tree first. You read the Windows UI Automation tree with Accessibility Insights or Inspect.exe, write the selector that points at the element, and the same selector works on your coworker's machine, on a CI runner, and in a Windows Sandbox.
Can I combine spatial and logical selectors?
Yes. role:Edit && rightof:(name:Username) finds an edit field that is both of role Edit AND to the right of an element named Username. The AND branches are flattened during parse (apply_operator at selector.rs line 283 merges nested Selector::And), so any number of predicates can compose. role:Edit && rightof:(name:Username) && !visible:false && process:chrome is a single conjunction.
How does this compare to AutoHotkey, AutoIt, Power Automate Desktop, and UiPath for targeting elements?
AutoHotkey v2 uses WinTitle syntax and ControlClick/ControlGet, which pin to window titles, class names, or ahk_id handles. AutoIt has AutoItX with similar primitives. Power Automate Desktop records clicks into opaque UIA selectors stored in its repository, which are visual-designer-only and not copy-pasteable across projects. UiPath has Full Selectors, Fuzzy Selectors, and Anchor Base activities, which are spatial but drag-and-drop only. Terminator's selectors are a string grammar you can type, commit to git, diff across versions, and chain as data. They also compile at runtime, so a running MCP agent can build them from user speech without a code change.
What if two selectors match the same element more than once?
The engine deduplicates by element id. In the positional filter at engine.rs line 1774, the anchor is explicitly skipped (if candidate.id() == anchor_id { return false }), so rightof:(name:Username) does not return the Username label itself. For non-spatial queries, Nth(N) picks the N-th match (role:Button,nth:0 is the first button). Or(Vec) returns all matches of any inner selector, deduplicated at collection time.
Is the grammar stable enough to build workflows on top of?
The parser and the Selector enum live in the core terminator crate that the Rust, Node, Python, and MCP bindings all depend on. It is MIT licensed. The boolean operators && || ! and the positional prefixes rightof: leftof: above: below: near: have been stable for over a year. Nth, Has, Parent, And, Or, and Not were added incrementally and remain backward compatible. The test file selector_tests.rs has dozens of cases covering the parser, including the legacy role|name pipe syntax which still works.