GuideUI testingWindows UIA + macOS AXPlaywright-shaped

Automation UI testing, past the browser

Every guide for "automation ui testing" points at one kind of framework: Selenium, Playwright, Cypress, WebdriverIO, TestCafe, Katalon. All of them drive a web page in a browser. All useful. All missing a category. This page is about the other half of UI testing: native desktop apps, cross-app workflows, and the parts of your product that do not live inside a Chromium tab. Same wait_for, validate, and locator shape you already know, but backed by the OS accessibility tree instead of the DOM.

Matthew Diakonov, Written with AI

Published April 20, 202612 min read

4.9from Open-source, MIT

Four WaitConditions: Exists, Visible, Enabled, Focused

18 typed AutomationError variants (ElementObscured, ElementNotStable, ElementDetached, ...)

One selector language across Windows UIA, macOS AX, and Chrome DOM

100 ms poll interval, explicit per-call timeouts, no implicit wait

Automation UI testing

past the browser, across every app

Same Playwright-shaped locator API

Backed by the OS accessibility tree

Four WaitConditions: Exists, Visible, Enabled, Focused

One test spans Chrome, Excel, Outlook, Slack

Flakes get typed names: ElementObscured, ElementNotStable

0:00 / 0:05

The whole wait-and-assert model in one file

This is the actual source of the async wait primitive in Terminator. Four conditions, a 100 millisecond poll loop, and a typed AutomationError::Timeout on miss. No hidden auto-wait chain. No global implicit timeout. If you have written Playwright, this looks like the explicit, minimal version of its auto-wait. The difference is that element.is_visible() asks Windows UIA (or macOS AX), not the DOM.

crates/terminator/src/locator.rs

0WaitConditions in the enum

0Typed error variants

0Poll interval (ms)

0Selector prefixes

One locator, four runtimes

The locator is the user-facing surface. Under it, four platform adapters resolve selectors against whatever subsystem actually knows about UI elements on the box. You do not pick an adapter explicitly. Scoping a locator to a process is enough to route it.

Selector routing by process

Every UI test flake, promoted to a type

UI tests fail in predictable ways: an element is there but zero size, there but disabled, there but obscured, there but still animating, not there yet at all. Most frameworks surface these as a single TimeoutError with a stringified message. Terminator gives each one a named variant. Your retry and recovery code matches on the variant, not on a regex over the message.

crates/terminator/src/errors.rs

ElementNotFoundTimeoutElementNotVisibleElementNotEnabledElementNotStableElementObscuredElementDetachedInvalidSelectorScrollFailedVerificationFailedUIAutomationAPIErrorPermissionDeniedOperationCancelled

The testing primitives, one card each

These six things are what differentiate a desktop-and-browser UI test framework from a browser-only one. None of them are product-specific. They are the surface you need whenever UI is the unit under test.

Exists, Visible, Enabled, Focused

Locator::wait_for takes a WaitCondition enum with exactly four variants. That mirrors the four questions a UI test actually asks. No string matchers, no fuzzy 'state' objects. Each condition is a one-method call on the resolved element.

100 ms poll, explicit timeout

The polling loop sleeps for 100 ms between checks. Timeout is the locator's default or the per-call argument. No global implicit wait.

validate() never throws on miss

validate() returns Ok(Some(element)) on hit, Ok(None) on ElementNotFound or Timeout, Err only for platform faults. Write assertions with if let, not try/catch.

18 typed error variants

AutomationError ships ElementNotVisible, ElementObscured, ElementNotStable, ElementDetached, ElementNotEnabled, UIAutomationAPIError with a COM code, plus the usual Timeout and InvalidSelector. Every flake has a name.

Selectors survive resize and theme

role:Button && name:Save is stable across window sizes, DPI, light/dark themes, RTL locales. Coordinates are not.

Scoped to one process

process:chrome roots a locator inside a running executable. window:Calculator narrows to a single top-level window. Tests do not leak into other apps.

The moment browser-only stops being enough

This is the test that breaks most setups. The seed data is in an Excel workbook. The primary flow is in Chrome. The confirmation lives in a native admin tool. Toggle to see what a browser-only framework can do vs what a desktop-aware one can.

Order-refund test: seed in Excel, act in Chrome, confirm in native admin

Playwright / Cypress / Selenium can reach the Chrome step. They cannot read the Excel cell and cannot click the native confirm button. Teams glue a second tool (Winium, AppiumDesktop, AutoIt) for the native parts. That second tool has a different selector language, a different error surface, and its own flake profile. The test becomes three tests taped together with a shell script.

Cannot read a cell from a running Excel workbook
Cannot drive a native admin tool
Two frameworks, two selector languages, two flake surfaces

Same test, two runtimes

// test.spec.ts
// Works only if the UI under test is a web page in a browser.

import { test, expect } from '@playwright/test';

test('order flow', async ({ page }) => {
  await page.goto('/orders');

  // Cannot read the Excel cell: the workbook is a native app.
  // Cannot click the Refund button in a desktop-only admin tool.

  await page.getByPlaceholder('Search orders').fill('OR-4213');

  await expect(page.getByText('Paid')).toBeVisible();

  await expect(page.getByRole('button', { name: 'Refund' }))
    .toBeEnabled();
});

-50% extra lines to cross into the native app

A full cross-app test, top to bottom

This file runs in Vitest (or Jest, Mocha, anything that exposes test and expect). One Desktop instance, three apps, one assertion. No extra setup beyond npm install @mediar-ai/terminator.

tests/checkout.spec.ts

What it looks like when it runs

Selector resolves are on the order of milliseconds. wait_for returns the instant the condition becomes true. The whole test finishes in about a second on a warm Windows session.

vitest run tests/checkout.spec.ts

The six moves in every desktop UI test

These are the steps, in order. If a step fails, the failure is a typed variant and your retry logic matches on it. No try/catch/stringify dance.

Write a locator

desktop.locator('process:EXCEL >> role:DataItem && name:A2'). No coordinates, no screenshot anchors. Use the Accessibility Insights picker on Windows or Accessibility Inspector on macOS to discover the role and name.

Resolve with a timeout

Every .first(timeoutMs) or .all(timeoutMs) call takes an explicit timeout in milliseconds. No implicit wait. If the element is not there in time, the call returns Timeout, a typed variant you can handle.

Wait for the right condition

Use wait_for(WaitCondition::Visible, 5_000) before clicking a button that animates in. Use wait_for(WaitCondition::Enabled, 5_000) before clicking a submit that is gated on validation. Poll interval is 100 ms.

Assert with validate

validate() returns Ok(None) on ElementNotFound or Timeout instead of throwing. Pair it with an assert: expect(await locator.validate(5_000)).not.toBeNull(). Your test runner records a single assertion.

Act, and pick invoke over click

For buttons that live off-screen or behind a virtual scroll, invoke() calls the OS accessibility default action directly. No mouse move, no viewport visibility required. Faster and flake-resistant.

Traverse into the next process

One Desktop object is a global locator root. To continue into a native admin tool after the browser, scope to its process name and keep going. Selectors look identical to the web ones.

Terminator vs a browser-only UI test framework

Point-by-point differences that matter when the UI under test is not purely web. Same shape, wider reach.

Feature	Browser-only (Selenium, Playwright, Cypress)	Terminator
Target surface	A single browser tab, DOM only	Every running process the OS exposes. Browser tabs, native apps, system dialogs, the taskbar
Selector language	CSS + ARIA + text + XPath	role, name, id, nativeid, classname, text, visible, process, window, plus rightof, leftof, above, below, near
Wait primitives	toBeVisible, toBeEnabled, toBeFocused, toHaveText, auto-waiting in actions	wait_for(Exists \| Visible \| Enabled \| Focused), validate() for assertion-style Ok(None)-on-miss, 100ms poll
Typed flake errors	TimeoutError, generic Error	18 variants including ElementNotVisible, ElementObscured, ElementNotStable, ElementDetached, ElementNotEnabled
Cross-app flows in one test	Impossible. A second tool (Winium, AppiumDesktop, AutoIt) is required for the native side	One Desktop instance covers Chrome, Excel, Outlook, a custom admin tool, and a system dialog in a single test
Invoke vs click	Click only. If the element is off-screen, scroll first	invoke() calls the accessibility API default action without requiring viewport visibility. Faster and more deterministic
State primitives on controls	check() for checkboxes, selectOption() for selects	setSelected(true) works for checkboxes AND radio buttons (Windows UIA quirk: radios ignore click())
Speed per action	10 to 50ms per action on a local browser	CPU speed, not LLM inference. ~1ms selector resolve on the accessibility tree, no screenshot parsing
Runs in CI without a display server	Yes, headless Chromium	Only with a real Windows session: UIA needs a desktop. Run on a Windows VM or a CI Windows image

One selector language, every prefix you need

These are every prefix the selector engine understands. You will use role: and name: 99% of the time. The positional ones (rightof:, near:) save you on ambiguous layouts.

role:name:text:id:nativeid:classname:visible:process:window:pos:rightof:leftof:above:below:near:

MIT

“Every line number, WaitCondition variant, error type, and selector prefix on this page is grep-able in a fresh clone of mediar-ai/terminator. No invented specs.”

github.com/mediar-ai/terminator

Specific assertions a desktop UI test needs to express

This is a concrete checklist of things that are awkward or impossible in a browser-only framework and natural in a desktop-aware one. Each item maps to one or two calls in the SDK.

Wait for a button to become visible on a modal that animates in
Assert a native menu item is enabled before invoking it
Wait for focus to land on a specific field after keyboard navigation
Catch ElementObscured when a tooltip covers the element and retry
Catch ElementNotStable while a toast is sliding in and re-poll
Move one test across Excel, Chrome, and a native admin tool
Scope a locator to a single process or window
Invoke a button that is off-screen without first scrolling
Use setSelected(true) on radio buttons that ignore click() on Windows
Read and assert on a cell value from a running Excel workbook

Apps you can target today

Anything whose window the OS exposes accessibility information for. That is most production Windows apps. Electron apps in particular tend to have well-labeled roles and names, which makes them easy selector targets.

ChromeEdgeFirefoxExcelWordOutlookSlack desktopTeamsNotion desktopFigma desktopVS CodeCursorExplorerPowerShellTerminalCalculatorNotepadPowerPoint

Why this framing matters

UI test automation has been split into two tribes for a decade. Browser people have Playwright and friends. Native people have Appium, Winium, WinAppDriver, and a queue of legacy RPA vendors. Neither side gives you a single language for the whole surface. Teams pay for that split with parallel test infrastructures, two on-call rotations for test flakiness, and scenarios that never get automated because they cross the boundary.

Terminator is a developer framework for building desktop automation. It is not a consumer app. It gives existing AI coding assistants, and your existing test runner, the ability to control your entire OS, not just a browser tab. Like Playwright, but for every app on your desktop.

If your product is a pure web app, keep using Playwright. If any part of your workflow leaves the tab, add Terminator to the same test file.

Poll interval, in milliseconds, of Locator::wait_for

Fixed in crates/terminator/src/locator.rs. Every 100 ms, the loop checks your condition against the live accessibility tree. Timeout is yours to pass in.

Have a UI test that leaves the browser tab?

Show us the scenario and we will map it to a single Terminator spec that spans every app it touches.

Frequently asked questions

What do most guides on 'automation ui testing' cover that this page does not?

They cover browser UI testing, narrowly. Selenium, Playwright, Cypress, Puppeteer, Katalon, TestCafe, WebdriverIO. All of them drive a web DOM through a browser driver (CDP, WebDriver-BiDi, or Selenium's JSON Wire). They are the right answer when the whole UI you need to test is a web page. This page covers the other half of automation UI testing: the UI that is not in a browser. Installed desktop apps, cross-app workflows, in-app dialogs that an embedded webview cannot reach, native admin tools, Excel, Outlook, PowerPoint, the Windows settings UI. For that, you need to drive the OS accessibility tree, not the DOM. Terminator is the Playwright-shaped framework for doing that. Same locator, wait_for, validate, click, type, invoke primitives; a different runtime under the hood.

What are the WaitConditions and how many are there?

Exactly four, defined as a Rust enum in crates/terminator/src/locator.rs. They are Exists, Visible, Enabled, and Focused. Locator::wait_for takes one of them and a Duration timeout. It polls with a 100 millisecond interval and checks the condition against the resolved element (element.is_visible(), element.is_enabled(), element.is_focused()) on each tick. The wait returns a typed Ok(element) when the condition becomes true, or AutomationError::Timeout if the deadline passes. That is the full surface. No fluent chain with 20 matchers, no implicit wait, no polling interval you need to tune. If you know Playwright's auto-wait, this is the explicit, minimal version of it.

How is this different from Appium, Winium, or WinAppDriver?

Those are WebDriver-protocol clients for native automation. They work, but the selectors are verbose WebDriver-style (By.AccessibilityId, By.Name, XPath over the UIA tree) and the API shape is older: desired capabilities, sessions, a Selenium-style findElement. Terminator is Playwright-shaped and code-first: a single Desktop() object, chained locators, async wait_for, typed errors, and one selector string that is readable at a glance (window:Calculator >> role:Button && name:Seven). Terminator also gives you an MCP server and a TypeScript workflow SDK with Zod schemas on top of the same primitives, which lets you reuse test steps as deterministic automations. Finally, Terminator is MIT-licensed and has no remote driver dependency: the SDK talks to UIA over COM directly from your test process.

Can the same test run against both a web app and a native app?

Yes. That is the anchor use case. A single Desktop instance scoped by process name gives you a locator root for any running program. A typical cross-app test looks like: 1) process:EXCEL to read a seed value from an open workbook; 2) process:chrome to drive the browser admin UI; 3) process:AdminDesktop to confirm the side-effect in the native dashboard. Same locator shape, same waitFor conditions, same typed errors, no second framework. Browser DOM access is available too through Terminator's Chrome extension bridge, which exposes executeBrowserScript() on any element resolved inside a Chrome process. That is useful when the accessibility tree is thin in a custom web widget and you need to fall back to the DOM.

Which flake errors does Terminator surface, and how do I handle ElementObscured or ElementNotStable?

The full list is in crates/terminator/src/errors.rs under the AutomationError enum: ElementNotFound, Timeout, PermissionDenied, PlatformError, UnsupportedOperation, UnsupportedPlatform, InvalidArgument, Internal, InvalidSelector, UIAutomationAPIError (with COM code + retryability flag), ElementDetached, ElementNotVisible, ElementNotEnabled, ElementNotStable, ElementObscured, ScrollFailed, OperationCancelled, VerificationFailed. Eighteen variants. Two of them map directly to the most common sources of UI test flake. ElementNotStable fires when the target's bounding box is still animating (a toast sliding in, a list reordering); the fix is to wait longer or use invoke() which does not need stable bounds. ElementObscured fires when another element is on top at click time (an overlay, a tooltip, a banner); the fix is to dismiss the covering element or press Escape and retry. Because these are typed, your test code matches on them explicitly instead of parsing a stringified Error.

What does invoke() do that click() does not?

click() synthesizes a mouse event at the element's center point. That requires the element to be visible in the viewport, its bounds to be stable, and nothing to be obscuring the pixel. invoke() calls the accessibility API's default action on the element directly. On Windows UIA that is the InvokePattern.Invoke method; on macOS AX it is the AXPress or role-appropriate action. It works even if the element is off-screen or partially obscured, it does not require a mouse move, and it is faster because it skips the hit-test. It is less faithful to a real user (no mouse, no focus-ring toggle), so prefer click() for user-facing smoke tests and invoke() for deep functional tests where you just need the action to fire. The one gotcha: radio buttons on Windows often ignore Invoke; use setSelected(true) instead.

Does this replace Playwright for browser tests?

No, and it is not meant to. Playwright is the best-in-class browser UI test runner with a mature fixture model, tracing, and a deep toolbox around it (component tests, API mocking, codegen, visual comparison). If your whole product under test is a web page, stay in Playwright. Terminator is the right tool when the product under test is a native app, or when a single scenario spans a browser plus a native app. You can also run both in the same test file: Playwright for the web surface, Terminator for the desktop surface. They do not fight. Terminator's selectors deliberately mirror Playwright's for cognitive load reasons. A Playwright user reading a Terminator test understands it on sight.

What does a single selector look like on Windows vs macOS?

Selectors are cross-platform by design. role:Button && name:Save reads the Windows UIA LocalizedControlType + Name on Windows, and the macOS AX Role + AXTitle on macOS. The Rust selector engine in crates/terminator/src/selector.rs normalizes both into the same matcher tree. The platform-specific adapters (crates/terminator/src/platforms/) translate the matchers into UIA property conditions or AX attribute queries. You write one test; it runs on both hosts where the app under test runs. The main caveat is that native widgets have different canonical roles on the two platforms (TextBox on Windows vs AXTextField on macOS). Use role:Edit as the cross-platform shim, or fall back to nativeid: when you need pinpoint precision, which is rare for well-labeled apps.

How fast is a selector resolve compared to a Playwright locator?

On Windows, a cached accessibility tree walk from a process root to a leaf button is ~1 ms for apps with a few thousand UIA nodes (Notepad, Calculator) and 10 to 50 ms for larger ones (Notion, VS Code). Playwright's locator resolve on a busy web page is comparable, 10 to 50 ms. The two are in the same order of magnitude. The place where the approaches diverge is screenshot-based computer use tools (Claude computer use, ChatGPT Agents, BrowserUse, Perplexity Comet). Those run an LLM against a screenshot for every action. That is seconds, not milliseconds. Terminator's README claims 100x the throughput over that category, which is consistent with the difference between a tree walk and an LLM inference call.

Does it run in CI?

Yes, on a Windows CI image with a real logged-in desktop session, or a self-hosted Windows VM. Windows UIA requires a desktop because it queries the Win32 accessibility APIs that only exist when a session is interactive. GitHub Actions windows-latest runners provide this. Headless CI on Linux does not work for Windows UIA automation (that is a category error, not a Terminator limitation). For pure browser tests in CI, keep using Playwright on a Linux runner. For desktop and cross-app tests, schedule them on Windows, treat them as integration tests rather than unit tests, and budget them accordingly. macOS AX in CI requires granting Accessibility permission to the runner process, which is scriptable via tccutil on GitHub Actions macOS runners with sudo access.