Automation UI testing, past the browser
Every guide for "automation ui testing" points at one kind of framework: Selenium, Playwright, Cypress, WebdriverIO, TestCafe, Katalon. All of them drive a web page in a browser. All useful. All missing a category. This page is about the other half of UI testing: native desktop apps, cross-app workflows, and the parts of your product that do not live inside a Chromium tab. Same wait_for, validate, and locator shape you already know, but backed by the OS accessibility tree instead of the DOM.
The whole wait-and-assert model in one file
This is the actual source of the async wait primitive in Terminator. Four conditions, a 100 millisecond poll loop, and a typed AutomationError::Timeout on miss. No hidden auto-wait chain. No global implicit timeout. If you have written Playwright, this looks like the explicit, minimal version of its auto-wait. The difference is that element.is_visible() asks Windows UIA (or macOS AX), not the DOM.
One locator, four runtimes
The locator is the user-facing surface. Under it, four platform adapters resolve selectors against whatever subsystem actually knows about UI elements on the box. You do not pick an adapter explicitly. Scoping a locator to a process is enough to route it.
Selector routing by process
Every UI test flake, promoted to a type
UI tests fail in predictable ways: an element is there but zero size, there but disabled, there but obscured, there but still animating, not there yet at all. Most frameworks surface these as a single TimeoutError with a stringified message. Terminator gives each one a named variant. Your retry and recovery code matches on the variant, not on a regex over the message.
The testing primitives, one card each
These six things are what differentiate a desktop-and-browser UI test framework from a browser-only one. None of them are product-specific. They are the surface you need whenever UI is the unit under test.
Exists, Visible, Enabled, Focused
Locator::wait_for takes a WaitCondition enum with exactly four variants. That mirrors the four questions a UI test actually asks. No string matchers, no fuzzy 'state' objects. Each condition is a one-method call on the resolved element.
100 ms poll, explicit timeout
The polling loop sleeps for 100 ms between checks. Timeout is the locator's default or the per-call argument. No global implicit wait.
validate() never throws on miss
validate() returns Ok(Some(element)) on hit, Ok(None) on ElementNotFound or Timeout, Err only for platform faults. Write assertions with if let, not try/catch.
18 typed error variants
AutomationError ships ElementNotVisible, ElementObscured, ElementNotStable, ElementDetached, ElementNotEnabled, UIAutomationAPIError with a COM code, plus the usual Timeout and InvalidSelector. Every flake has a name.
Selectors survive resize and theme
role:Button && name:Save is stable across window sizes, DPI, light/dark themes, RTL locales. Coordinates are not.
Scoped to one process
process:chrome roots a locator inside a running executable. window:Calculator narrows to a single top-level window. Tests do not leak into other apps.
The moment browser-only stops being enough
This is the test that breaks most setups. The seed data is in an Excel workbook. The primary flow is in Chrome. The confirmation lives in a native admin tool. Toggle to see what a browser-only framework can do vs what a desktop-aware one can.
Order-refund test: seed in Excel, act in Chrome, confirm in native admin
Playwright / Cypress / Selenium can reach the Chrome step. They cannot read the Excel cell and cannot click the native confirm button. Teams glue a second tool (Winium, AppiumDesktop, AutoIt) for the native parts. That second tool has a different selector language, a different error surface, and its own flake profile. The test becomes three tests taped together with a shell script.
- Cannot read a cell from a running Excel workbook
- Cannot drive a native admin tool
- Two frameworks, two selector languages, two flake surfaces
Same test, two runtimes
// test.spec.ts
// Works only if the UI under test is a web page in a browser.
import { test, expect } from '@playwright/test';
test('order flow', async ({ page }) => {
await page.goto('/orders');
// Cannot read the Excel cell: the workbook is a native app.
// Cannot click the Refund button in a desktop-only admin tool.
await page.getByPlaceholder('Search orders').fill('OR-4213');
await expect(page.getByText('Paid')).toBeVisible();
await expect(page.getByRole('button', { name: 'Refund' }))
.toBeEnabled();
});A full cross-app test, top to bottom
This file runs in Vitest (or Jest, Mocha, anything that exposes test and expect). One Desktop instance, three apps, one assertion. No extra setup beyond npm install @mediar-ai/terminator.
What it looks like when it runs
Selector resolves are on the order of milliseconds. wait_for returns the instant the condition becomes true. The whole test finishes in about a second on a warm Windows session.
The six moves in every desktop UI test
These are the steps, in order. If a step fails, the failure is a typed variant and your retry logic matches on it. No try/catch/stringify dance.
Write a locator
desktop.locator('process:EXCEL >> role:DataItem && name:A2'). No coordinates, no screenshot anchors. Use the Accessibility Insights picker on Windows or Accessibility Inspector on macOS to discover the role and name.
Resolve with a timeout
Every .first(timeoutMs) or .all(timeoutMs) call takes an explicit timeout in milliseconds. No implicit wait. If the element is not there in time, the call returns Timeout, a typed variant you can handle.
Wait for the right condition
Use wait_for(WaitCondition::Visible, 5_000) before clicking a button that animates in. Use wait_for(WaitCondition::Enabled, 5_000) before clicking a submit that is gated on validation. Poll interval is 100 ms.
Assert with validate
validate() returns Ok(None) on ElementNotFound or Timeout instead of throwing. Pair it with an assert: expect(await locator.validate(5_000)).not.toBeNull(). Your test runner records a single assertion.
Act, and pick invoke over click
For buttons that live off-screen or behind a virtual scroll, invoke() calls the OS accessibility default action directly. No mouse move, no viewport visibility required. Faster and flake-resistant.
Traverse into the next process
One Desktop object is a global locator root. To continue into a native admin tool after the browser, scope to its process name and keep going. Selectors look identical to the web ones.
Terminator vs a browser-only UI test framework
Point-by-point differences that matter when the UI under test is not purely web. Same shape, wider reach.
| Feature | Browser-only (Selenium, Playwright, Cypress) | Terminator |
|---|---|---|
| Target surface | A single browser tab, DOM only | Every running process the OS exposes. Browser tabs, native apps, system dialogs, the taskbar |
| Selector language | CSS + ARIA + text + XPath | role, name, id, nativeid, classname, text, visible, process, window, plus rightof, leftof, above, below, near |
| Wait primitives | toBeVisible, toBeEnabled, toBeFocused, toHaveText, auto-waiting in actions | wait_for(Exists | Visible | Enabled | Focused), validate() for assertion-style Ok(None)-on-miss, 100ms poll |
| Typed flake errors | TimeoutError, generic Error | 18 variants including ElementNotVisible, ElementObscured, ElementNotStable, ElementDetached, ElementNotEnabled |
| Cross-app flows in one test | Impossible. A second tool (Winium, AppiumDesktop, AutoIt) is required for the native side | One Desktop instance covers Chrome, Excel, Outlook, a custom admin tool, and a system dialog in a single test |
| Invoke vs click | Click only. If the element is off-screen, scroll first | invoke() calls the accessibility API default action without requiring viewport visibility. Faster and more deterministic |
| State primitives on controls | check() for checkboxes, selectOption() for selects | setSelected(true) works for checkboxes AND radio buttons (Windows UIA quirk: radios ignore click()) |
| Speed per action | 10 to 50ms per action on a local browser | CPU speed, not LLM inference. ~1ms selector resolve on the accessibility tree, no screenshot parsing |
| Runs in CI without a display server | Yes, headless Chromium | Only with a real Windows session: UIA needs a desktop. Run on a Windows VM or a CI Windows image |
One selector language, every prefix you need
These are every prefix the selector engine understands. You will use role: and name: 99% of the time. The positional ones (rightof:, near:) save you on ambiguous layouts.
“Every line number, WaitCondition variant, error type, and selector prefix on this page is grep-able in a fresh clone of mediar-ai/terminator. No invented specs.”
github.com/mediar-ai/terminator
Specific assertions a desktop UI test needs to express
This is a concrete checklist of things that are awkward or impossible in a browser-only framework and natural in a desktop-aware one. Each item maps to one or two calls in the SDK.
- Wait for a button to become visible on a modal that animates in
- Assert a native menu item is enabled before invoking it
- Wait for focus to land on a specific field after keyboard navigation
- Catch ElementObscured when a tooltip covers the element and retry
- Catch ElementNotStable while a toast is sliding in and re-poll
- Move one test across Excel, Chrome, and a native admin tool
- Scope a locator to a single process or window
- Invoke a button that is off-screen without first scrolling
- Use setSelected(true) on radio buttons that ignore click() on Windows
- Read and assert on a cell value from a running Excel workbook
Apps you can target today
Anything whose window the OS exposes accessibility information for. That is most production Windows apps. Electron apps in particular tend to have well-labeled roles and names, which makes them easy selector targets.
Why this framing matters
UI test automation has been split into two tribes for a decade. Browser people have Playwright and friends. Native people have Appium, Winium, WinAppDriver, and a queue of legacy RPA vendors. Neither side gives you a single language for the whole surface. Teams pay for that split with parallel test infrastructures, two on-call rotations for test flakiness, and scenarios that never get automated because they cross the boundary.
Terminator is a developer framework for building desktop automation. It is not a consumer app. It gives existing AI coding assistants, and your existing test runner, the ability to control your entire OS, not just a browser tab. Like Playwright, but for every app on your desktop.
If your product is a pure web app, keep using Playwright. If any part of your workflow leaves the tab, add Terminator to the same test file.
Poll interval, in milliseconds, of Locator::wait_for
Fixed in crates/terminator/src/locator.rs. Every 100 ms, the loop checks your condition against the live accessibility tree. Timeout is yours to pass in.
Have a UI test that leaves the browser tab?
Show us the scenario and we will map it to a single Terminator spec that spans every app it touches.
Frequently asked questions
What do most guides on 'automation ui testing' cover that this page does not?
They cover browser UI testing, narrowly. Selenium, Playwright, Cypress, Puppeteer, Katalon, TestCafe, WebdriverIO. All of them drive a web DOM through a browser driver (CDP, WebDriver-BiDi, or Selenium's JSON Wire). They are the right answer when the whole UI you need to test is a web page. This page covers the other half of automation UI testing: the UI that is not in a browser. Installed desktop apps, cross-app workflows, in-app dialogs that an embedded webview cannot reach, native admin tools, Excel, Outlook, PowerPoint, the Windows settings UI. For that, you need to drive the OS accessibility tree, not the DOM. Terminator is the Playwright-shaped framework for doing that. Same locator, wait_for, validate, click, type, invoke primitives; a different runtime under the hood.
What are the WaitConditions and how many are there?
Exactly four, defined as a Rust enum in crates/terminator/src/locator.rs. They are Exists, Visible, Enabled, and Focused. Locator::wait_for takes one of them and a Duration timeout. It polls with a 100 millisecond interval and checks the condition against the resolved element (element.is_visible(), element.is_enabled(), element.is_focused()) on each tick. The wait returns a typed Ok(element) when the condition becomes true, or AutomationError::Timeout if the deadline passes. That is the full surface. No fluent chain with 20 matchers, no implicit wait, no polling interval you need to tune. If you know Playwright's auto-wait, this is the explicit, minimal version of it.
How is this different from Appium, Winium, or WinAppDriver?
Those are WebDriver-protocol clients for native automation. They work, but the selectors are verbose WebDriver-style (By.AccessibilityId, By.Name, XPath over the UIA tree) and the API shape is older: desired capabilities, sessions, a Selenium-style findElement. Terminator is Playwright-shaped and code-first: a single Desktop() object, chained locators, async wait_for, typed errors, and one selector string that is readable at a glance (window:Calculator >> role:Button && name:Seven). Terminator also gives you an MCP server and a TypeScript workflow SDK with Zod schemas on top of the same primitives, which lets you reuse test steps as deterministic automations. Finally, Terminator is MIT-licensed and has no remote driver dependency: the SDK talks to UIA over COM directly from your test process.
Can the same test run against both a web app and a native app?
Yes. That is the anchor use case. A single Desktop instance scoped by process name gives you a locator root for any running program. A typical cross-app test looks like: 1) process:EXCEL to read a seed value from an open workbook; 2) process:chrome to drive the browser admin UI; 3) process:AdminDesktop to confirm the side-effect in the native dashboard. Same locator shape, same waitFor conditions, same typed errors, no second framework. Browser DOM access is available too through Terminator's Chrome extension bridge, which exposes executeBrowserScript() on any element resolved inside a Chrome process. That is useful when the accessibility tree is thin in a custom web widget and you need to fall back to the DOM.
Which flake errors does Terminator surface, and how do I handle ElementObscured or ElementNotStable?
The full list is in crates/terminator/src/errors.rs under the AutomationError enum: ElementNotFound, Timeout, PermissionDenied, PlatformError, UnsupportedOperation, UnsupportedPlatform, InvalidArgument, Internal, InvalidSelector, UIAutomationAPIError (with COM code + retryability flag), ElementDetached, ElementNotVisible, ElementNotEnabled, ElementNotStable, ElementObscured, ScrollFailed, OperationCancelled, VerificationFailed. Eighteen variants. Two of them map directly to the most common sources of UI test flake. ElementNotStable fires when the target's bounding box is still animating (a toast sliding in, a list reordering); the fix is to wait longer or use invoke() which does not need stable bounds. ElementObscured fires when another element is on top at click time (an overlay, a tooltip, a banner); the fix is to dismiss the covering element or press Escape and retry. Because these are typed, your test code matches on them explicitly instead of parsing a stringified Error.
What does invoke() do that click() does not?
click() synthesizes a mouse event at the element's center point. That requires the element to be visible in the viewport, its bounds to be stable, and nothing to be obscuring the pixel. invoke() calls the accessibility API's default action on the element directly. On Windows UIA that is the InvokePattern.Invoke method; on macOS AX it is the AXPress or role-appropriate action. It works even if the element is off-screen or partially obscured, it does not require a mouse move, and it is faster because it skips the hit-test. It is less faithful to a real user (no mouse, no focus-ring toggle), so prefer click() for user-facing smoke tests and invoke() for deep functional tests where you just need the action to fire. The one gotcha: radio buttons on Windows often ignore Invoke; use setSelected(true) instead.
Does this replace Playwright for browser tests?
No, and it is not meant to. Playwright is the best-in-class browser UI test runner with a mature fixture model, tracing, and a deep toolbox around it (component tests, API mocking, codegen, visual comparison). If your whole product under test is a web page, stay in Playwright. Terminator is the right tool when the product under test is a native app, or when a single scenario spans a browser plus a native app. You can also run both in the same test file: Playwright for the web surface, Terminator for the desktop surface. They do not fight. Terminator's selectors deliberately mirror Playwright's for cognitive load reasons. A Playwright user reading a Terminator test understands it on sight.
What does a single selector look like on Windows vs macOS?
Selectors are cross-platform by design. role:Button && name:Save reads the Windows UIA LocalizedControlType + Name on Windows, and the macOS AX Role + AXTitle on macOS. The Rust selector engine in crates/terminator/src/selector.rs normalizes both into the same matcher tree. The platform-specific adapters (crates/terminator/src/platforms/) translate the matchers into UIA property conditions or AX attribute queries. You write one test; it runs on both hosts where the app under test runs. The main caveat is that native widgets have different canonical roles on the two platforms (TextBox on Windows vs AXTextField on macOS). Use role:Edit as the cross-platform shim, or fall back to nativeid: when you need pinpoint precision, which is rare for well-labeled apps.
How fast is a selector resolve compared to a Playwright locator?
On Windows, a cached accessibility tree walk from a process root to a leaf button is ~1 ms for apps with a few thousand UIA nodes (Notepad, Calculator) and 10 to 50 ms for larger ones (Notion, VS Code). Playwright's locator resolve on a busy web page is comparable, 10 to 50 ms. The two are in the same order of magnitude. The place where the approaches diverge is screenshot-based computer use tools (Claude computer use, ChatGPT Agents, BrowserUse, Perplexity Comet). Those run an LLM against a screenshot for every action. That is seconds, not milliseconds. Terminator's README claims 100x the throughput over that category, which is consistent with the difference between a tree walk and an LLM inference call.
Does it run in CI?
Yes, on a Windows CI image with a real logged-in desktop session, or a self-hosted Windows VM. Windows UIA requires a desktop because it queries the Win32 accessibility APIs that only exist when a session is interactive. GitHub Actions windows-latest runners provide this. Headless CI on Linux does not work for Windows UIA automation (that is a category error, not a Terminator limitation). For pure browser tests in CI, keep using Playwright on a Linux runner. For desktop and cross-app tests, schedule them on Windows, treat them as integration tests rather than unit tests, and budget them accordingly. macOS AX in CI requires granting Accessibility permission to the runner process, which is scriptable via tccutil on GitHub Actions macOS runners with sudo access.