Selenium UI automation, extended to every native app on your desktop

Selenium taught a generation of engineers to pick UI elements by role, name, and id, and to chain locators through descendants. Terminator keeps that mental model and ports it off the browser. One selector language covers Chrome, Excel, Slack, Finder, and the title bar of the window you are reading this in. Built on 753 lines of Rust over the OS accessibility tree.

Matthew Diakonov, Written with AI

Published April 22, 20269 min read

4.9from used by AI coding agents in Claude, Cursor, VS Code

Single selector file: crates/terminator/src/selector.rs (753 lines)

Five spatial operators Selenium has no equivalent for

Same >> descendant chaining you know from Playwright

Your Selenium skills

still work outside the browser

role:, name:, id:, classname:, text: all port straight over

Chain locators with >> the way you chain Playwright

Five spatial operators the DOM never exposed

One language across Chrome, Excel, Slack, Finder

0:00 / 0:05

The browser was always a subset

Most guides about this topic assume the thing you are automating is a web page. They show you how to install a WebDriver, pick a By strategy, wait for an element, and click a button. That works fine as long as every control your user touches lives inside a Chromium process. The moment a Save As dialog appears, the moment the user switches to Slack, the moment the test needs to drag a file onto the app icon in the Dock, Selenium has nothing to say about it.

The underlying reason is architectural. Selenium WebDriver speaks the W3C WebDriver protocol, and that protocol was designed to drive a rendered DOM inside a browser engine. A driver executable (chromedriver, geckodriver, safaridriver) sits between the test and the browser and translates commands. Everything outside the browser process is invisible.

Terminator takes the same mental model (locators, roles, names, chaining) and points it at the layer below the browser: the OS accessibility tree. Windows UI Automation and macOS Accessibility API both expose every window, every control, every label, every text field, with stable role and name fields. A screen reader can see all of it. So can an automation script.

753 lines

“The selector grammar you wrote for Selenium already matches your native desktop.”

crates/terminator/src/selector.rs, 32-variant Selector enum

Where the Selenium mental model stops

These are all real UI surfaces a browser-only runner cannot reach. Every one of them is a plain window in the OS accessibility tree.

Native Save As dialogmacOS menu barWindows taskbarFile Explorer / FinderExcel cell gridSlack desktop appPhotoshop toolbarVS Code command paletteIntelliJ context menuSystem Preferences pane

Side by side: a login flow, then a desktop handoff

Left: a Selenium test that can log in but cannot go any further than the rendered DOM. Right: a Terminator script that signs in, then pivots into Slack, in the same process.

Same locator style, different reach

# Selenium WebDriver
# This only works if the UI is a web page rendered inside a browser
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://app.example.com")

# Fill a login form
driver.find_element(By.ID, "email").send_keys("me@example.com")
driver.find_element(By.ID, "password").send_keys("hunter2")
driver.find_element(
    By.CSS_SELECTOR, "button[type=submit]"
).click()

# You cannot reach a native Save As dialog
# You cannot reach Slack, Excel, VS Code, or the menu bar
# You cannot reach the app's own title bar buttons

-56% more reach per line

How the selector engine is put together

Three inputs feed into one parser. A prefix selector like role: or name: goes through a straightforward atomic parser. A chained expression with >> splits on the operator and recurses. A boolean expression with &&, ||, or ! runs through a hand-rolled tokenizer and a recursive-descent parser that produces an AST. All three paths produce the same Selector enum, which the locator engine then walks against the accessibility tree.

What the parser does with a string

The five positional selectors, parsed line by line

This is the part of the grammar that has no Selenium equivalent. Each operator takes another selector as its anchor and returns elements whose bounds fall in the corresponding spatial region.

selector.rs

A practical example. You are writing a test for a settings page where the Email label drifts up and down as other form rows appear or collapse. In Selenium you would write something likefollowing-sibling::input[1]and hope nobody reshuffles the DOM. In Terminator the selector isrightof:name:Emailand the anchor survives layout changes because the accessibility tree always knows which control is to the right of a label.

What transfers from your Selenium test suite

Two lists. Everything on the left you already know. Everything on the right is new, and costs about an afternoon to learn.

Carries over from Selenium

role= becomes role:
id= becomes id:
name= becomes name:
className= becomes classname:
text= becomes text:
descendant chaining via >>
first(), all(), timeout()
type_text, click, press_key

New in Terminator

rightof:<selector>
leftof:<selector>
above:<selector>
below:<selector>
near:<selector>
&&, ||, ! with parentheses
has:<selector> (Playwright :has() style)
.. for parent navigation

Six ways the locator story diverges

Group tour of the design choices. Some are direct ports, some only make sense once you are outside the browser.

Same prefix grammar

role:, id:, name:, classname:, text: are all direct analogs of Selenium's By.* locator families. If you can read a Selenium test today, you can read a Terminator selector tomorrow.

Same descendant chaining

The >> operator walks the accessibility tree the way Playwright's >> walks the DOM. window:Calculator >> role:Button && name:Seven is the calculator app's seven key.

Five new spatial operators

rightof:, leftof:, above:, below:, near: have no Selenium equivalent. They exist because screen readers need to describe layout spatially, so the accessibility tree preserves what the DOM does not.

Real boolean expressions

role:Button && !name:Cancel is one string, parsed by a hand-rolled tokenizer and a recursive-descent expression parser. No manual collection filtering, no XPath gymnastics.

Every native window

Chrome, Excel, Slack, VS Code, File Explorer, Finder, the OS menu bar. Anything the accessibility API exposes is reachable with one selector language.

No driver executables

Selenium needs chromedriver, geckodriver, safaridriver. Terminator talks to UIA on Windows and AX on macOS directly from Rust. Zero WebDriver processes on your machine.

What a selector string actually does

A small tour through the grammar. Every command on the left is a valid Terminator selector; the output is the element it resolves to in the accessibility tree.

selector → match

Feature matrix, item by item

The column names are the capabilities most developers pick a UI automation tool for. Ticks are honest: when Selenium can do something, it says so.

Feature	Selenium	Terminator
Pick elements by accessibility role	Yes, via By.role	Yes, via role: prefix
Pick elements by id and name	Yes, via By.id / By.name	Yes, via id: and name: prefixes
Chain locators through descendants	Yes, via nested WebElement.find	Yes, via >> operator
Boolean expressions in a single selector	No, requires manual code filtering	Yes (&&, \|\|, !, parentheses)
Spatial operators (above, below, near)	No	Yes, five built-in
Targets a native file dialog	No, the dialog is outside the DOM	Yes, it is just another window
Targets an Excel cell or a Slack DM	No	Yes
Requires a driver executable per browser	Yes (chromedriver, geckodriver, etc.)	No, uses OS accessibility APIs directly
Keeps the user's browser cookies and sessions	No, spawns a fresh profile	Yes, attaches to the running session
Blocks the user's mouse and keyboard	Yes in many modes	No, runs through accessibility API

Moving an existing Selenium test across

Five steps, half a day

Install the SDK for your language

pip install terminator-py on Python 3.10+, npm install @mediar-ai/terminator on Node.js, or cargo add terminator-rs in Rust. Same selector language across all three.

Replace webdriver.Chrome() with Desktop()

desktop = terminator.Desktop() gives you a handle on the whole accessibility tree. desktop.open_url() still works, and it attaches to the default browser without spawning a fresh profile.

Translate your By.* locators into prefix selectors

By.ID becomes id:, By.NAME becomes name:, By.CLASS_NAME becomes classname:. For complex paths, build a chain with >> instead of nested find_element calls.

Add spatial selectors where the web version needed fragile XPath

Anywhere your Selenium test did following-sibling::input[1], try rightof:name:Label instead. It reads better, it survives DOM restructures, and it works in native apps too.

Keep the rest of the test harness

pytest, jest, mocha, XUnit, Page Object Model, all still apply. Terminator is a driver layer, not a framework, so your assertion library and reporting stack stay the same.

Why a real parser, not a regex

A common shortcut in locator libraries is to treat && as a string split and move on. That falls apart the first time someone writes a selector with nested parentheses or a not-operator that binds tighter than an or-operator. Terminator instead runs every non-trivial selector through a real tokenizer and a real recursive-descent expression parser.

selector.rs

The practical payoff: a selector like(role:Button && !name:Cancel) || classname:PrimaryActionparses correctly the first time, and the same string round-trips through serialization for logs and test reports.

Numbers from the actual repo

Read from wc -l crates/terminator/src/selector.rs and the Selector enum definition.

lines in selector.rs

variants in the Selector enum

positional operators

lines of unit tests

Install it, in any of four languages

one of these, your pick

Bringing a Selenium test suite to the rest of your desktop?

Book 20 minutes with our team. We will walk through your existing locators and sketch the Terminator equivalents on the spot.

Frequently asked questions

Why does Selenium only work inside a browser?

Selenium WebDriver was built on top of the W3C WebDriver protocol, which is implemented by browser engines (Chromium, Gecko, WebKit) through their driver executables (chromedriver, geckodriver, safaridriver). That protocol describes how to drive a rendered DOM, not a native window, so a Selenium session literally cannot see a native menu bar, a file dialog, a taskbar, or an app written in Cocoa, Win32, Qt, or WinUI. The rendered DOM is its entire world model.

What does Terminator use instead of WebDriver?

Native OS accessibility APIs. On Windows that is UI Automation (UIA), the same API screen readers use to traverse the whole desktop. On macOS it is the Accessibility API (AX). Both expose every window, every control, every text field, every button, in a structured tree with role, name, id, and value fields. Terminator's Rust core, in crates/terminator/src, wraps those APIs and then runs a Selenium-shaped selector language on top. You write role:Button and name:Send; Terminator walks the UIA or AX tree and finds it.

Do my Selenium locator skills transfer?

Most of them. If you already think in role, name, id, class name, descendant combinators, and text matches, you are 80 percent of the way to a Terminator selector. The prefix grammar in crates/terminator/src/selector.rs accepts role:, name:, id:, classname:, and text: as direct analogs to By.role, By.name, By.id, By.className, and By.linkText. The >> operator chains locators exactly like Playwright's >> chaining, which is itself a descendant combinator. What does not transfer: CSS selectors, XPath on HTML elements, and anything that relied on shadow DOM.

What can Terminator do that Selenium cannot?

Five positional operators. selector.rs lines 419 to 437 parse rightof:, leftof:, above:, below:, and near:, each of which takes another selector and returns elements spatially related to the anchor in the accessibility tree. You cannot express near:text:Cancel in Selenium because the browser DOM does not expose stable spatial relationships. The accessibility tree does, because screen readers need them to describe layout out loud. Terminator also supports boolean selectors, so role:Button and not name:Cancel or classname:Submit is a single string.

Can I automate both the browser and the rest of the app in one script?

Yes, and that is usually the reason to pick Terminator over Selenium. A common flow: open a desktop client such as Slack or Notion, copy a link, open Chrome with that link, fill a form, return to the desktop client, and paste the result into a message. Selenium can do step three only. Terminator can do all five in one script because every target is just a role or name in the accessibility tree regardless of which process owns the window. The selector window:Slack and window:Chrome do not care that one is Electron and one is a native Chromium session.

How is this different from pyautogui or image-based runners like Sikuli?

Image-based runners match screenshots and click pixel coordinates. They break whenever a UI theme changes, the display DPI shifts, fonts hint differently, or a scrollbar steals two pixels. Terminator never reaches for pixels by default. It reads role, name, id, and bounds out of the accessibility tree, so a button that repaints its background is still the same Button node with the same name. Pixels are available as a last-resort pos:x,y selector, but the documented pattern is to build on the accessibility layer.

Does it work on a headless CI agent?

Yes on Windows, with an active user session. UIA requires a logged-in desktop to inspect, so you run it on a Windows VM that auto-logs-in, not on a hosted Linux GitHub Actions runner. Our own examples folder uses this exact pattern on Windows 11 VMs provisioned via Vagrant, and the MCP agent ships windows-x86_64 binaries out of the box. macOS support requires Accessibility permission granted to the parent process; Linux uses AT-SPI2 at the Rust level.

How big is the selector engine and can I read the source?

The engine is a single Rust file, crates/terminator/src/selector.rs, 753 lines. It contains the Selector enum (32 variants), a hand-rolled tokenizer that recognizes && and || and ! and parentheses, and a recursive-descent parser that builds a Selector AST. The positional operators each take five to seven lines to parse. The boolean expression parser sits at lines 216 to 330. 543 lines of unit tests live next to it in selector_tests.rs. Public mirror: github.com/mediar-ai/terminator.