Desktop testing · deep dive

Most automation tools for testing desktop applications break on release day. Here is the four-line function that prevents it.

Pick any list of desktop testing tools and you will read the same three things: it has self-healing AI, it has visual recognition, it has a low-code recorder. None of those answer the only mechanical question that matters when a test is red on a Wednesday morning: does the tool address the same on-screen element by the same identifier today as it did yesterday? Terminator's answer is a BLAKE3 hash over four accessibility properties. Same element, same hash, even after the application is killed and relaunched as a new OS process. The hash function is 67 lines, the test that proves stability is 30, and they are both in the public Rust source.

Matthew Diakonov, Written with AI

Published April 27, 20268 min read

The maintenance trap is mechanical, not philosophical

Industry surveys put 50% to 70% of QA effort into fixing tests that worked the day before. The vendor pages blame “flaky tests”, a phrase that obscures the actual cause. There is nothing flaky happening at runtime. What is happening is that the tool's identity scheme is too tightly coupled to something the application changes routinely. If the identity scheme is a screen coordinate, a single button-position tweak invalidates it. If it is a screenshot, a theme or DPI change invalidates it. If it is a positional path like Window/Toolbar/Item[3] through the UI tree, reordering two siblings invalidates it.

The fix is not to layer self-healing on top of a fragile scheme. The fix is to start from a scheme that does not depend on the things the app changes for cosmetic and structural reasons.

Same scenario, two identity schemes

// A typical desktop test using a coordinate-based or DOM-path tool. // On Tuesday this passed. The user's app shipped a refactor on Tuesday // night that nudged the toolbar by 18px and renamed an internal handler. // On Wednesday morning, every test in the suite is red. await driver.click({ x: 412, y: 88 }); // toolbar moved await driver.findByPath("Window/Toolbar/Item[3]"); // index changed await driver.findByXPath("//Pane[1]/Button[2]"); // pane reordered await driver.findByImage("save_button.png"); // theme update broke it

Layout shift breaks coordinate clicks
Theme change breaks screenshot match
Sibling reorder breaks positional path
Tests must be re-recorded after every visual change

The four properties Terminator hashes

Every desktop UI element exposed by the OS accessibility tree carries a small set of identity properties. On Windows, those come from the UI Automation (UIA) COM API. Terminator picks four of them, in priority order, and hashes the concatenation:

automation_id

The string a developer assigns to the element in code (e.g. "SaveButton"). The most stable property when present.

role

The accessibility control type: Button, Edit, CheckBox. Skipped when Custom.

name

The accessible label users see, e.g. "Save". The same string screen readers announce.

classname

The Win32 class name (Button, Edit). Stable per widget kind.

The function pushes whichever of those four are non-empty into a single string, hashes it with BLAKE3, and returns the first 8 bytes as a u64. If all four are empty (a rare case for anything a real test targets), it falls back to the element's bounding rectangle, and only as a final resort to the in-memory pointer (which is explicitly documented as NOT stable across sessions). The whole function:

// crates/terminator/src/platforms/windows/utils.rs, lines 21 to 88
// Generate a stable element ID based on element properties.
pub fn generate_element_id(
    element: &uiautomation::UIElement,
) -> Result<usize, AutomationError> {
    let automation_id = element.get_automation_id().ok().filter(|s| !s.is_empty());
    let role         = element.get_control_type().ok()
        .filter(|t| *t != ControlType::Custom);
    let name         = element.get_name().ok().filter(|s| !s.is_empty());
    let class_name   = element.get_classname().ok().filter(|s| !s.is_empty());

    let mut to_hash = String::new();
    if let Some(id) = automation_id    { to_hash.push_str(&id); }
    if let Some(r)  = role             { to_hash.push_str(&r.to_string()); }
    if let Some(n)  = name             { to_hash.push_str(&n); }
    if let Some(cn) = class_name       { to_hash.push_str(&cn); }

    if to_hash.is_empty() {
        if let Ok(rect) = element.get_bounding_rectangle() {
            to_hash.push_str(&format!(
                "{}:{}:{}:{}",
                rect.get_left(), rect.get_top(),
                rect.get_width(), rect.get_height(),
            ));
        }
    }

    let hash = blake3::hash(to_hash.as_bytes());
    Ok(hash.as_bytes()[0..8]
        .try_into()
        .map(u64::from_le_bytes)
        .unwrap() as usize)
}

Two things are worth noticing. First, the hash is content-derived, not session-derived; nothing in the input depends on a process ID, a window handle, or a memory address (except the explicit fallback path, which is documented as session-only). Second, BLAKE3 is deterministic for a given input, so equal inputs produce equal hashes regardless of which OS process is running, which CPU is executing it, or how many days have passed since the last run.

The test that fails the build if stability ever regresses

Claims about test stability tend to be marketing prose. This one is a #[tokio::test] in the repository. It is rigged to fail loudly the moment anyone changes the identity scheme in a way that breaks cross-restart stability.

// crates/terminator/src/tests/id_stability_tests.rs
// Verifies that generate_element_id returns the SAME hash after Notepad
// is killed and restarted. If this assertion ever fires, a regression
// has occurred in the identity scheme.
#[tokio::test]
#[ignore]
async fn test_element_id_stability_across_restarts() -> Result<(), AutomationError> {
    let get_notepad_document_hash = || -> Result<usize, AutomationError> {
        let (_guard, desktop, notepad_app) = setup_notepad();
        let document_selector = Selector::Role {
            role: "document".to_string(),
            name: None,
        };
        let doc_element = desktop.engine.find_element(
            &document_selector, Some(&notepad_app), None,
        )?;
        let doc_impl = doc_element
            .as_any()
            .downcast_ref::<WindowsUIElement>()
            .ok_or_else(|| AutomationError::PlatformError(
                "Failed to downcast UIElement".to_string(),
            ))?;
        generate_element_id(&doc_impl.element.0)
    };

    // Launch Notepad, hash the document, kill the process.
    let hash1 = get_notepad_document_hash()?;
    thread::sleep(Duration::from_millis(500));

    // Launch a NEW Notepad instance, hash the document again.
    let hash2 = get_notepad_document_hash()?;

    // Same element across two distinct OS processes -> same ID.
    assert_eq!(
        hash1, hash2,
        "The element ID should be stable when the application is restarted.          If this fails, a regression has occurred."
    );
    Ok(())
}

hash1 == hash2

“The element ID should be stable when the application is restarted. If this fails, a regression has occurred.”

crates/terminator/src/tests/id_stability_tests.rs

Two distinct OS processes, two separate UIAutomation interrogations, the same 64-bit ID. That assertion is what lets a test author write role:Button && name:Save today and have it still match the same element on the next CI run, on a fresh Windows VM, after the user's editor has been killed and respawned by the test harness.

What happens when your test asks for an element

Selector parsing

Your selector string (for example process:notepad >> role:Button && name:Save) is parsed into a tree of clauses. No coordinates, no images, no XPath.

Tree walk via UIA

Terminator walks the Windows UI Automation tree under the chosen process or window, asking each node for its automation_id, role, name, and classname.

Filter by clause

Each clause (role:..., name:..., id:..., classname:...) is matched against those properties. Substring match by default, no wildcards.

Hash the survivors

For each candidate element, generate_element_id concatenates the four properties and runs BLAKE3 over them, producing a stable 64-bit fingerprint.

Return the element

The first match (or the nth, if you used nth:N) is wrapped in a UIElement. The fingerprint is what makes this same element addressable on the next run.

How this changes the comparison with the older toolchain

A lot of the desktop test category was shaped before the accessibility tree was a reliable surface to bind tests to. SikuliX targeted images. Older record-and-replay tools targeted coordinates and window handles. Newer entrants layered self-heal on top of those. None of them changed the underlying identity scheme. Terminator did. The practical differences look like this:

Feature	Conventional desktop test tool	Terminator
Element identity scheme	Coordinate, DOM path, or screenshot match	BLAKE3 hash of automation_id + role + name + classname
ID stability across app restarts	Often regenerated; brittle on process recycle	Same hash, asserted by id_stability_tests.rs
Reaction to a 1-pixel layout shift	Image match misses; coordinates miss	Unchanged (no spatial dependency)
Authoring surface	Proprietary recorder or low-code IDE	TypeScript / Rust / Python SDK + MCP server
How a test gets fixed when it breaks	QA engineer opens the IDE, re-records	AI agent reads the typecheck error, edits the workflow file
License	Mostly proprietary, often per-seat	MIT, source on GitHub (mediar-ai/terminator)

The fair caveat: if your target application does not implement the accessibility API at all (some legacy custom-drawn frameworks), all four input properties may be empty, and Terminator falls back to bounding-rectangle hashing, which is more stable than raw pixels but not as stable as a real automation_id. In that scenario, image-based tools and Terminator end up in roughly the same place. Where the accessibility tree is populated (which is the case for almost every modern WinForms, WPF, WinUI, Electron, Qt, and UWP app), Terminator's hash gives you a deterministic identifier that older tools cannot.

What that one mechanism unlocks for a test suite

A stable element ID is unglamorous on its own. The interesting consequences show up in second-order behaviour:

What stable IDs make possible

Snapshot-based assertions: capture the IDs of every important control once, diff against them on every release, fail the build only when an actual identity change occurs.
Cache locators across runs: a test that hits 60 elements does not need to re-walk the UIA tree 60 times if the IDs were captured during the previous run.
Run the same suite on a teammate's machine, on a fresh Azure Windows VM, on the CI runner, and get matching IDs for every element you care about.
Tag flaky-looking failures as either real (the hash changed) or environmental (the hash matched but the click was preempted), instead of bucketing both into 'flaky'.
Hand the failure log to an AI coding assistant via MCP, including the exact (selector, expected_hash, observed_hash) tuple, and let it propose a one-line workflow patch.

Going from this page to a passing desktop test

Install
npm install @mediar-ai/terminator @mediar-ai/workflow zod
2
Locate
Write a selector like role:Button && name:Save inside your target window.
3
Act
Drive click, typeText, invoke, setSelected on the element.
4
Verify
Read the element back and assert against name, value, or its hashed ID.

The Node.js, Python, and MCP packages currently ship Windows binaries. macOS exists at the Rust layer (cargo add terminator-rs) and Linux uses AT-SPI2. If you want an AI coding assistant to author the suite, point it at terminator-mcp-agent and use the bundled typecheck_workflow tool to validate workflows before running them.

The honest tradeoffs

A page that pretends a tool has no downsides is useless to anyone actually picking one. Here are the places where this approach loses, and why someone might still pick a different tool.

Tradeoff #1

You write code, not a recorder script.

If your QA team only writes plain-English instructions and clicks "record", this is the wrong shape. testRigor or Katalon Studio fit that workflow better. Terminator is a developer framework: TypeScript, Python, or Rust source files with selectors and assertions. The MCP server narrows the gap because an AI assistant can author tests for non-coders, but the artifact is still code.

Tradeoff #2

Custom-rendered apps without accessibility expose less to bind to.

If you are testing a game built in a custom engine, or an old MFC app that never wired up accessibility, the four hash inputs will mostly be empty and you will fall back to bounding-rectangle hashing. SikuliX or vision-AI tools are specifically built for that case and will outperform here.

Tradeoff #3

Cross-platform packaging is uneven.

The npm and pip packages currently ship Windows binaries only. macOS works against the Rust crate today, but you will be writing Rust or building bindings yourself if you target both. TestComplete and Test Studio give you packaged cross-platform support out of the box.

What the same test looks like

// A typical desktop test using a coordinate-based or DOM-path tool.
// On Tuesday this passed. The user's app shipped a refactor on Tuesday
// night that nudged the toolbar by 18px and renamed an internal handler.
// On Wednesday morning, every test in the suite is red.

await driver.click({ x: 412, y: 88 });            // toolbar moved
await driver.findByPath("Window/Toolbar/Item[3]"); // index changed
await driver.findByXPath("//Pane[1]/Button[2]");   // pane reordered
await driver.findByImage("save_button.png");       // theme update broke it

-125% lines, but stable across releases

The point of all this

When someone shopping for a desktop testing tool asks “which one is best?”, they are usually about to be handed a list of brand names with feature checkmarks. The checkmarks rarely tell you which tool will still match the same on-screen element after your team ships its next refactor. That property is decided at the layer below all the marketing, specifically at the point where the framework decides what counts as the same element.

Terminator's answer is in utils.rs at line 23, in 67 lines of Rust, and is verified by a tokio test in the same crate. You can read both files in under five minutes. Whatever tool you pick after reading this, ask its docs the same question and read the answer at the same level of detail. If the answer is missing, that is its own answer.

0properties hashed

0 bytesof BLAKE3 used as ID

0session-bound inputs

0lines in the function

Talking through how this would land in your test suite

Book 30 minutes if you want to walk through whether the hash-based identity scheme fits your application's accessibility coverage. We will look at your actual app, not a slide deck.

Questions readers actually ask

Why do most desktop test suites need re-recording after every release?

Because the test tool's identity scheme is bound to something the app changes routinely: pixel coordinates, screenshots of buttons, or fragile DOM-style paths through the accessibility tree. A single layout tweak invalidates one of those, and the test fails to find its target. Terminator decouples identity from layout entirely. The element's ID is a BLAKE3 hash over four properties exposed by the OS accessibility API: automation_id, role, name, classname. None of those four change when a designer moves a button 18 pixels to the right.

Where is the actual hash function defined?

In crates/terminator/src/platforms/windows/utils.rs, function generate_element_id, lines 21 to 88. It concatenates the four properties into a single string, runs blake3::hash over the bytes, and takes the first 8 bytes interpreted as a little-endian u64. If all four properties are missing (which is rare for any element a real test would target), it falls back to bounding-rectangle coordinates, then to the Arc pointer as a last resort.

How is the stability claim actually verified?

There is a test in crates/terminator/src/tests/id_stability_tests.rs called test_element_id_stability_across_restarts. It launches Notepad as a new OS process, locates the document element, hashes it, kills the process, launches a fresh Notepad, finds the document element again, hashes it, and asserts the two hashes are equal. If a refactor ever changes the identity scheme in a way that breaks cross-restart stability, that test fails in CI before the change ships.

Does this work cross-platform or just on Windows?

Windows is the primary target with full feature support, including the hashing scheme described here. macOS support exists at the core Rust level via the Accessibility API (with permissions). Linux uses AT-SPI2. The Node.js, Python, and MCP packages currently ship Windows binaries only, so if your test target is a macOS-only app, build against the Rust crate directly rather than the npm/pip packages.

What does an AI coding assistant do with this?

Terminator ships an MCP server (terminator-mcp-agent) that exposes the desktop automation primitives plus a typecheck_workflow tool. An assistant like Claude Code or Cursor can author a test as a TypeScript workflow file, ask the MCP server to typecheck it, and only then run it. When something breaks, the failure comes back as a structured object with file, line, code, and message, not a stack trace, which is easier for the assistant to repair without escalating to a human.

When does the hash actually change?

It changes when one of the four input properties changes: automation_id is renamed in the app's source, the control type is altered (rare), the accessible name is rewritten (for example, a button label change from "Save" to "Save file"), or the underlying Win32 classname is replaced. None of those happen from layout, theme, font, or window-size changes. They only happen when a developer intentionally edits a property the accessibility API reads, which is exactly when a test SHOULD be re-examined.

Is there a free or open-source version?

Yes. The whole framework is MIT-licensed at github.com/mediar-ai/terminator. The Rust crate (terminator-rs), the Node.js package (@mediar-ai/terminator), the Python package (terminator-py), the MCP agent (terminator-mcp-agent), and the workflow SDK (@mediar-ai/workflow) are all installable from public registries. There is no per-seat license, no proprietary recorder, and no separate enterprise build.

Adjacent reading

Deep dive

Automation testing tool for desktop application: AI-fixable error contracts

How Terminator hands a failing test back to the AI that wrote it, as JSON instead of a stack trace.

Read

Health checks

Automation tools for UI testing that prove they can see the UI first

Terminator's MCP agent ships a /ready endpoint that boots UIAutomation and returns 200, 206, or 503 in under five seconds.

Read

Walkthrough

Automation testing for desktop application: end-to-end walkthrough

A full setup guide from npm install to a passing assertion against a running Windows app.

Read