Selenium UI automation, extended to every native app on your desktop
Selenium taught a generation of engineers to pick UI elements by role, name, and id, and to chain locators through descendants. Terminator keeps that mental model and ports it off the browser. One selector language covers Chrome, Excel, Slack, Finder, and the title bar of the window you are reading this in. Built on 753 lines of Rust over the OS accessibility tree.
The browser was always a subset
Most guides about this topic assume the thing you are automating is a web page. They show you how to install a WebDriver, pick a By strategy, wait for an element, and click a button. That works fine as long as every control your user touches lives inside a Chromium process. The moment a Save As dialog appears, the moment the user switches to Slack, the moment the test needs to drag a file onto the app icon in the Dock, Selenium has nothing to say about it.
The underlying reason is architectural. Selenium WebDriver speaks the W3C WebDriver protocol, and that protocol was designed to drive a rendered DOM inside a browser engine. A driver executable (chromedriver, geckodriver, safaridriver) sits between the test and the browser and translates commands. Everything outside the browser process is invisible.
Terminator takes the same mental model (locators, roles, names, chaining) and points it at the layer below the browser: the OS accessibility tree. Windows UI Automation and macOS Accessibility API both expose every window, every control, every label, every text field, with stable role and name fields. A screen reader can see all of it. So can an automation script.
“The selector grammar you wrote for Selenium already matches your native desktop.”
crates/terminator/src/selector.rs, 32-variant Selector enum
Where the Selenium mental model stops
These are all real UI surfaces a browser-only runner cannot reach. Every one of them is a plain window in the OS accessibility tree.
Side by side: a login flow, then a desktop handoff
Left: a Selenium test that can log in but cannot go any further than the rendered DOM. Right: a Terminator script that signs in, then pivots into Slack, in the same process.
Same locator style, different reach
# Selenium WebDriver
# This only works if the UI is a web page rendered inside a browser
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://app.example.com")
# Fill a login form
driver.find_element(By.ID, "email").send_keys("me@example.com")
driver.find_element(By.ID, "password").send_keys("hunter2")
driver.find_element(
By.CSS_SELECTOR, "button[type=submit]"
).click()
# You cannot reach a native Save As dialog
# You cannot reach Slack, Excel, VS Code, or the menu bar
# You cannot reach the app's own title bar buttonsHow the selector engine is put together
Three inputs feed into one parser. A prefix selector like role: or name: goes through a straightforward atomic parser. A chained expression with >> splits on the operator and recurses. A boolean expression with &&, ||, or ! runs through a hand-rolled tokenizer and a recursive-descent parser that produces an AST. All three paths produce the same Selector enum, which the locator engine then walks against the accessibility tree.
What the parser does with a string
The five positional selectors, parsed line by line
This is the part of the grammar that has no Selenium equivalent. Each operator takes another selector as its anchor and returns elements whose bounds fall in the corresponding spatial region.
A practical example. You are writing a test for a settings page where the Email label drifts up and down as other form rows appear or collapse. In Selenium you would write something likefollowing-sibling::input[1]and hope nobody reshuffles the DOM. In Terminator the selector isrightof:name:Emailand the anchor survives layout changes because the accessibility tree always knows which control is to the right of a label.
What transfers from your Selenium test suite
Two lists. Everything on the left you already know. Everything on the right is new, and costs about an afternoon to learn.
Carries over from Selenium
- role= becomes role:
- id= becomes id:
- name= becomes name:
- className= becomes classname:
- text= becomes text:
- descendant chaining via >>
- first(), all(), timeout()
- type_text, click, press_key
New in Terminator
- rightof:<selector>
- leftof:<selector>
- above:<selector>
- below:<selector>
- near:<selector>
- &&, ||, ! with parentheses
- has:<selector> (Playwright :has() style)
- .. for parent navigation
Six ways the locator story diverges
Group tour of the design choices. Some are direct ports, some only make sense once you are outside the browser.
Same prefix grammar
role:, id:, name:, classname:, text: are all direct analogs of Selenium's By.* locator families. If you can read a Selenium test today, you can read a Terminator selector tomorrow.
Same descendant chaining
The >> operator walks the accessibility tree the way Playwright's >> walks the DOM. window:Calculator >> role:Button && name:Seven is the calculator app's seven key.
Five new spatial operators
rightof:, leftof:, above:, below:, near: have no Selenium equivalent. They exist because screen readers need to describe layout spatially, so the accessibility tree preserves what the DOM does not.
Real boolean expressions
role:Button && !name:Cancel is one string, parsed by a hand-rolled tokenizer and a recursive-descent expression parser. No manual collection filtering, no XPath gymnastics.
Every native window
Chrome, Excel, Slack, VS Code, File Explorer, Finder, the OS menu bar. Anything the accessibility API exposes is reachable with one selector language.
No driver executables
Selenium needs chromedriver, geckodriver, safaridriver. Terminator talks to UIA on Windows and AX on macOS directly from Rust. Zero WebDriver processes on your machine.
What a selector string actually does
A small tour through the grammar. Every command on the left is a valid Terminator selector; the output is the element it resolves to in the accessibility tree.
Feature matrix, item by item
The column names are the capabilities most developers pick a UI automation tool for. Ticks are honest: when Selenium can do something, it says so.
| Feature | Selenium | Terminator |
|---|---|---|
| Pick elements by accessibility role | Yes, via By.role | Yes, via role: prefix |
| Pick elements by id and name | Yes, via By.id / By.name | Yes, via id: and name: prefixes |
| Chain locators through descendants | Yes, via nested WebElement.find | Yes, via >> operator |
| Boolean expressions in a single selector | No, requires manual code filtering | Yes (&&, ||, !, parentheses) |
| Spatial operators (above, below, near) | No | Yes, five built-in |
| Targets a native file dialog | No, the dialog is outside the DOM | Yes, it is just another window |
| Targets an Excel cell or a Slack DM | No | Yes |
| Requires a driver executable per browser | Yes (chromedriver, geckodriver, etc.) | No, uses OS accessibility APIs directly |
| Keeps the user's browser cookies and sessions | No, spawns a fresh profile | Yes, attaches to the running session |
| Blocks the user's mouse and keyboard | Yes in many modes | No, runs through accessibility API |
Moving an existing Selenium test across
Five steps, half a day
Install the SDK for your language
pip install terminator-py on Python 3.10+, npm install @mediar-ai/terminator on Node.js, or cargo add terminator-rs in Rust. Same selector language across all three.
Replace webdriver.Chrome() with Desktop()
desktop = terminator.Desktop() gives you a handle on the whole accessibility tree. desktop.open_url() still works, and it attaches to the default browser without spawning a fresh profile.
Translate your By.* locators into prefix selectors
By.ID becomes id:, By.NAME becomes name:, By.CLASS_NAME becomes classname:. For complex paths, build a chain with >> instead of nested find_element calls.
Add spatial selectors where the web version needed fragile XPath
Anywhere your Selenium test did following-sibling::input[1], try rightof:name:Label instead. It reads better, it survives DOM restructures, and it works in native apps too.
Keep the rest of the test harness
pytest, jest, mocha, XUnit, Page Object Model, all still apply. Terminator is a driver layer, not a framework, so your assertion library and reporting stack stay the same.
Why a real parser, not a regex
A common shortcut in locator libraries is to treat && as a string split and move on. That falls apart the first time someone writes a selector with nested parentheses or a not-operator that binds tighter than an or-operator. Terminator instead runs every non-trivial selector through a real tokenizer and a real recursive-descent expression parser.
The practical payoff: a selector like(role:Button && !name:Cancel) || classname:PrimaryActionparses correctly the first time, and the same string round-trips through serialization for logs and test reports.
Numbers from the actual repo
Read from wc -l crates/terminator/src/selector.rs and the Selector enum definition.
Install it, in any of four languages
Bringing a Selenium test suite to the rest of your desktop?
Book 20 minutes with our team. We will walk through your existing locators and sketch the Terminator equivalents on the spot.
Frequently asked questions
Why does Selenium only work inside a browser?
Selenium WebDriver was built on top of the W3C WebDriver protocol, which is implemented by browser engines (Chromium, Gecko, WebKit) through their driver executables (chromedriver, geckodriver, safaridriver). That protocol describes how to drive a rendered DOM, not a native window, so a Selenium session literally cannot see a native menu bar, a file dialog, a taskbar, or an app written in Cocoa, Win32, Qt, or WinUI. The rendered DOM is its entire world model.
What does Terminator use instead of WebDriver?
Native OS accessibility APIs. On Windows that is UI Automation (UIA), the same API screen readers use to traverse the whole desktop. On macOS it is the Accessibility API (AX). Both expose every window, every control, every text field, every button, in a structured tree with role, name, id, and value fields. Terminator's Rust core, in crates/terminator/src, wraps those APIs and then runs a Selenium-shaped selector language on top. You write role:Button and name:Send; Terminator walks the UIA or AX tree and finds it.
Do my Selenium locator skills transfer?
Most of them. If you already think in role, name, id, class name, descendant combinators, and text matches, you are 80 percent of the way to a Terminator selector. The prefix grammar in crates/terminator/src/selector.rs accepts role:, name:, id:, classname:, and text: as direct analogs to By.role, By.name, By.id, By.className, and By.linkText. The >> operator chains locators exactly like Playwright's >> chaining, which is itself a descendant combinator. What does not transfer: CSS selectors, XPath on HTML elements, and anything that relied on shadow DOM.
What can Terminator do that Selenium cannot?
Five positional operators. selector.rs lines 419 to 437 parse rightof:, leftof:, above:, below:, and near:, each of which takes another selector and returns elements spatially related to the anchor in the accessibility tree. You cannot express near:text:Cancel in Selenium because the browser DOM does not expose stable spatial relationships. The accessibility tree does, because screen readers need them to describe layout out loud. Terminator also supports boolean selectors, so role:Button and not name:Cancel or classname:Submit is a single string.
Can I automate both the browser and the rest of the app in one script?
Yes, and that is usually the reason to pick Terminator over Selenium. A common flow: open a desktop client such as Slack or Notion, copy a link, open Chrome with that link, fill a form, return to the desktop client, and paste the result into a message. Selenium can do step three only. Terminator can do all five in one script because every target is just a role or name in the accessibility tree regardless of which process owns the window. The selector window:Slack and window:Chrome do not care that one is Electron and one is a native Chromium session.
How is this different from pyautogui or image-based runners like Sikuli?
Image-based runners match screenshots and click pixel coordinates. They break whenever a UI theme changes, the display DPI shifts, fonts hint differently, or a scrollbar steals two pixels. Terminator never reaches for pixels by default. It reads role, name, id, and bounds out of the accessibility tree, so a button that repaints its background is still the same Button node with the same name. Pixels are available as a last-resort pos:x,y selector, but the documented pattern is to build on the accessibility layer.
Does it work on a headless CI agent?
Yes on Windows, with an active user session. UIA requires a logged-in desktop to inspect, so you run it on a Windows VM that auto-logs-in, not on a hosted Linux GitHub Actions runner. Our own examples folder uses this exact pattern on Windows 11 VMs provisioned via Vagrant, and the MCP agent ships windows-x86_64 binaries out of the box. macOS support requires Accessibility permission granted to the parent process; Linux uses AT-SPI2 at the Rust level.
How big is the selector engine and can I read the source?
The engine is a single Rust file, crates/terminator/src/selector.rs, 753 lines. It contains the Selector enum (32 variants), a hand-rolled tokenizer that recognizes && and || and ! and parentheses, and a recursive-descent parser that builds a Selector AST. The positional operators each take five to seven lines to parse. The boolean expression parser sits at lines 216 to 330. 543 lines of unit tests live next to it in selector_tests.rs. Public mirror: github.com/mediar-ai/terminator.