Test automation for desktop applications, with a 10ms grace window written into the element finder

Every guide on desktop regression testing talks about auto-healing selectors as a marketing bullet. This one hands you the millisecond number. Terminator races the primary selector against every alternative in parallel tokio tasks, and even when an alternative wins first, the primary gets a 10 millisecond grace window to catch up. That discipline is the reason a suite that lands on Terminator stops silently drifting to weaker selectors between runs.

utils.rs:240210ms graceselect_all raceselectors_tried JSONMIT
M
Matthew Diakonov
11 min read
4.9from teams running desktop regression suites on Terminator
Primary + alternatives raced in parallel via futures::future::select_all
10 millisecond grace window for primary-selector preference
fallback_selectors tried sequentially only after the race fails
selectors_tried JSON array returned on total failure for flake diagnosis

The problem with a flat selector list

Pick any popular desktop automation library and look at how fallback selectors work. You get a list. The tool walks it in order, tries the first, times out, tries the second, times out, and so on. On a healthy run this is fine. On a flaky run, where one in ten launches the primary takes 2.5 seconds to resolve while an alternative resolves in 200 milliseconds, the serial walk is wasting almost all of its budget on the primary before getting to the alternative that would have worked. Worse, every regression suite using that tool is paying that flake tax on every affected run.

The opposite extreme is just as bad. Race everything in parallel and accept whoever wins, and the selector your test records as "the one that worked" jitters between runs. Your primary selector is the one you actually want used, for readability and for long-term maintenance. If it happens to take three milliseconds longer than an alternative on Tuesday, you do not want Tuesday's test logs to quietly rewrite themselves around the alternative. The logs must look the same as Monday's.

The fix Terminator ships is a parallel race with a primary preference window. Ten milliseconds.

The race, visualized

Three kinds of inputs flow into the element finder. The primary lives at index 0 of the race and is the only one allowed to hold priority. Alternatives are raced against it in parallel. Fallbacks sit outside the race and wait their turn.

find_element_with_fallbacks, the three input lanes

primary
alternatives
fallback_selectors
select_all + 10ms grace
(element, selector)
successful_selector
ElementNotFound

The race, in code

The function is find_element_with_fallbacks in crates/terminator-mcp-agent/src/utils.rs. The relevant slice is lines 2400 to 2447. The grace branch is the one that does not appear in any other open source desktop test tool I have read.

crates/terminator-mcp-agent/src/utils.rs

Three things to flag. First, the grace is asymmetric: only the primary gets a second chance, alternatives do not. Second, the inner locator timeout inside the grace is 1 millisecond, not 10, because the outer tokio::time::timeout already caps the whole branch at 10ms. Third, task.abort() on the losers is synchronous and non-optional, so the race cannot leak background probes after a winner is declared.

10 ms

Wired an existing desktop regression suite into Terminator, watched the successful_selector field stay pinned to the primary selector across 80 runs despite the UI under test shipping a minor label tweak mid-week. The alternatives caught the tweaked label without the suite rewriting itself.

internal dogfood on a WPF expense-report app

Seven phases of a single find

What happens from when you call a step to when you get an element

1

Primary task spawned

tokio::spawn wraps a locator.first(timeout=3s) for the primary selector. The task carries an in_current_span() so its tracing attributes line up with the step it belongs to. This is index 0 in the race.

2

Alternative tasks spawned

Each comma-separated alternative selector becomes its own tokio::spawn, same 3s budget, same span context. They are queued behind the primary in the completed_tasks vector so the enumeration later identifies which task finished.

3

select_all drives the race

futures::future::select_all returns the first task to complete along with its index. If the primary (index 0) resolves first, its element wins immediately. If any other task resolves first, the grace branch triggers.

4

10 millisecond grace window

tokio::time::timeout(Duration::from_millis(10), ...) re-runs the primary locator with an inner Duration::from_millis(1) timeout. If the primary returns an element inside that 10ms outer window, it is preferred over the alternative that just won. Otherwise the alternative is accepted.

5

Losers get task.abort()

The remaining spawned tasks are aborted synchronously. This is why the race does not leak tokio tasks or continue probing accessibility APIs after a winner is declared. Determinism on the happy path, no leaked work on the failure path.

6

Sequential fallbacks if everyone loses

If primary and every alternative fail, fallback_selectors are tried one at a time. This is where raw ID selectors or geometry selectors live: not fast enough to race fairly, but worth a shot as last-resort recovery. Failures here accumulate into the errors vector.

7

ElementNotFound with selectors_tried

On total failure, get_selectors_tried_all concatenates primary + alternatives + fallbacks and stuffs them into the JSON error. The MCP response is McpError::invalid_params with the full diagnostic payload. Test harnesses read this without parsing stack traces.

The three input lanes

Each step in a Terminator test declares up to three selector lanes. They are not interchangeable. Mixing them blurs the mental model the grace window is built on.

primary

Your canonical selector. Lives at index 0 of the race. Always wins on a tie thanks to the 10ms grace branch at utils.rs line 2425.

alternatives

Comma-separated semantic equivalents. Raced in parallel tokio tasks. Example: 'role:Button|name:Submit, role:Button|name:Save, role:Button|name:Confirm'.

fallback_selectors

Last-resort sequential list. For when every race candidate failed. Typically raw ID (#12345) or spatial (near:name:Amount).

selectors_tried

JSON array returned on failure. Built by get_selectors_tried_all in helpers.rs line 103. Aggregates primary + alternatives + fallbacks in declaration order.

retries

Outer loop around the race. 250ms sleep between attempts. A step with retries=3 can spend up to 9 seconds finding its target before giving up.

UIAutomationAPIError

System-level COM failure short-circuits the whole race. Remaining tasks are aborted. Payload flags is_retryable based on the COM error code.

The failure payload

When the race fails and the sequential fallback pass fails, Terminator does not just throw a generic timeout. The error builder in helpers.rs concatenates every selector it touched into a selectors_tried array and pairs it with four concrete diagnostic suggestions. A test-harness integration reads this JSON and knows what to try next without a human parsing a stack trace.

crates/terminator-mcp-agent/src/helpers.rs

What an actual run log looks like

terminator-mcp-agent trace

Read the line that says primary after grace. That is the 10ms window at work. Without it, the successful_selector would have been role:Button|name:Save, because the alternative literally resolved first. Over 80 runs of the same test across a week, the successful_selector field stays pinned to role:Button|name:Submit instead of jittering between the two. The flake histogram downstream becomes actionable.

Numbers straight from the source

0ms
grace window for primary-selector preference
0ms
default per-selector find budget
0ms
sleep between outer retry attempts
0
utils.rs line where select_all drives the race

What this buys you, step by step

Side effects of the race + grace topology

  • Flaky primary selectors stop stealing the whole 3s timeout budget. Alternatives resolve in parallel, so the total find latency is capped at the slowest single candidate, not the sum.
  • Close races resolve deterministically in favor of the primary. successful_selector does not jitter between runs when the UI is a few milliseconds quicker at an alternative.
  • Last-resort recovery paths (raw IDs, spatial selectors) live in fallback_selectors and do not pollute the race. They only fire when the semantic equivalents have all genuinely failed.
  • On total failure, the ElementNotFound JSON ships with a selectors_tried array and four suggestions. Test-harness dashboards get a machine-readable flake story without parsing stack traces.
  • System-wide COM failures cancel the race instead of burning through every selector one by one. A broken UIA service is a test-harness-level retry, not a step-level retry.
  • The same topology runs against Win32, WPF, WinUI 3, UWP, Electron, macOS AX, and Chrome through the bundled extension bridge. One race implementation, every desktop UI grammar.

Side by side with the typical approach

FeatureTypical desktop test tool with fallback selectorsTerminator
Race topology for flake resistanceSerial. Primary selector times out first, then the second is tried, then the third. Cumulative latency on every flake.Parallel. Primary + N alternatives raced via futures::future::select_all. Total latency capped at the slowest single candidate.
Selector priority after a close raceWhichever candidate returned first wins. Suite silently drifts to the next-best selector between runs.10ms grace window. Primary is preferred even if an alternative won by a few milliseconds. successful_selector stays stable.
Separation of 'acceptable equivalent' and 'last resort'One flat list. A raw ID or vision-based rescue has the same priority as a proper role|name match.Alternatives raced in parallel. fallback_selectors tried sequentially only after everyone fails.
Telemetry on a failureStack trace or timeout exception. You reverse-engineer which selectors the tool actually tried.selectors_tried JSON array + four suggestions, emitted as McpError::invalid_params. Machine-readable flake diagnosis.
System-level COM failure handlingSame retry loop as element-not-found. Budget burned trying 4 more selectors against a broken UIA service.UIAutomationAPIError short-circuits the race. Remaining tasks aborted. is_retryable flag tells the harness whether to retry the whole test.
Desktop scopeBrowser-first tools miss it entirely. Commercial desktop test tools often cover only Windows.Same race across Win32, WPF, UWP, WinUI 3, Electron, Chrome via extension, macOS AX.

Why this matters more than it sounds

Most desktop regression suites fall apart on the third week, not the first. The first week looks perfect because you tuned selectors against a clean build. By week three, the app has shipped a handful of minor label changes and async loading tweaks. The suite still passes, but the logs show different successful_selector values on different runs, and now you cannot tell whether last night's failure was a regression or just the suite drifting. The 10ms grace is a small piece of code, lines 2420 to 2447 of one Rust file, that turns that fuzzy log into a sharp signal.

Install into Claude Code, Cursor, VS Code, or Windsurf in one line: claude mcp add terminator "npx -y terminator-mcp-agent@latest". After that, every find call the agent makes goes through the race.

Have a desktop regression suite whose successful_selector field is jittering between runs?

Bring 5 flaky steps. On a 20-minute call we will wire them into Terminator, run the suite live, and show the race + grace behavior pinning each step to its primary selector.

Frequently asked questions

What makes test automation for desktop applications harder than web test automation?

A browser test has a DOM. The DOM is inspectable, the selectors are stable across runs, and every framework agrees on the grammar. Desktop apps do not share a model. Win32, WPF, WinUI 3, UWP, Electron, QT, Swing, a 2004 VB6 line-of-business tool, and a modern macOS SwiftUI app all sit on different accessibility substrates. Terminator flattens that into one selector grammar through the OS accessibility APIs (UIAutomation on Windows, AX on macOS) and then adds a parallel-race element finder at crates/terminator-mcp-agent/src/utils.rs line 2306 that is the reason desktop test suites stop getting flaky. Primary, alternatives, and fallbacks are all declared up front on each step, raced in parallel, and the primary keeps its priority through a 10 millisecond grace window even when an alternative wins.

Why is a 10 millisecond grace period important?

Without it, the first alternative to resolve wins the race and becomes the selector the step records. If you listed alternatives in the order of 'what I want first, then acceptable fallbacks second', your actual test suite starts drifting to the second choice whenever the UI is a few milliseconds quicker at producing it, which happens constantly because accessibility trees are populated asynchronously. The 10ms grace at utils.rs line 2425 is a `tokio::time::timeout(Duration::from_millis(10), ...)` that gives the primary one more shot with a 1ms inner locator timeout. If the primary lands inside the grace window, it is preferred. This is what keeps the successful_selector field in the tool response deterministic across reruns.

How are alternatives different from fallbacks?

Alternatives are raced in parallel with the primary using tokio::spawn and futures::future::select_all (line 2402). They are for the case where 'the Submit button is sometimes labeled Save and sometimes Save Changes' and you want any of them to match. Fallbacks are tried sequentially only after every alternative has already failed (line 2472). They are for the case where 'if nothing above works, last-resort try a raw numeric ID like #18234'. The separation matters because parallel races should contain semantic equivalents, while sequential fallbacks are allowed to be progressively weaker or slower. You do not want a raw ID winning a race against a role|name selector.

What does the error payload look like when every selector fails?

An McpError::invalid_params with error_type='ElementNotFound'. The JSON includes 'selectors_tried' (the full concatenated list from get_selectors_tried_all in helpers.rs line 103), the original underlying error, and four suggestions: call get_window_tree to refresh, check name/role, use a numeric ID, or call validate_element which never throws. That payload is generated at helpers.rs line 178 to 191. For test harness integrations, this gives you a machine-readable failure diagnosis without parsing stack traces. A flake report becomes: 'step X failed, tried [selectors], got [reason], suggestion [action]'.

What happens when the underlying Windows UIA API itself fails, not just the selector?

A UIAutomationAPIError short-circuits the entire race. At utils.rs line 2451, if any task returns that variant, remaining tasks are aborted immediately and the error propagates. The rationale is that a COM-level failure is system-wide: trying three more selectors against a broken UIA service is pointless and wastes the timeout budget. The error payload sets is_retryable based on the COM error code, so a test runner can decide to retry the whole test after a short delay versus failing out. Most transient COM errors (like 0x80040201 RPC_E_CALL_REJECTED) are retryable.

What is the default timeout and how does it interact with retries?

The find timeout defaults to 3000 milliseconds (utils.rs line 2316, get_timeout helper). On the write side, find_and_execute_with_retry_with_fallback adds an outer retry loop with a 250ms sleep between attempts (line 2617). Each inner find call gets its own 3s budget. So a step with retries=3 and default timeout can spend up to 9 seconds plus 750ms of sleep trying to find its element before giving up. That budget is what you tune when a step lives in a launch-heavy path where the target only appears after a slow modal.

Can I see which selector actually won in each test step?

Yes. Every successful find returns a tuple of (element, successful_selector). The MCP tool responses surface this as the 'selector_used' field in the result JSON. If you aggregate those across a run, you get a histogram: primary hit rate vs alternative hit rate vs fallback hit rate. A sudden jump in alternative hits is the canonical signal that the app under test changed and your primary selector is rotting. Terminator's telemetry spans also record verification.method so you can filter by 'direct_property_read' versus 'window_scoped_search' after a rerun.

Does this work on macOS desktop apps too?

Yes. The race is implemented above the platform adapter, in the MCP agent crate. On macOS, locator.first() resolves through the AX (NSAccessibility) adapter in the terminator-macos crate. The race topology, the 10ms grace window, the abort discipline, and the selectors_tried diagnostic are identical. Linux (AT-SPI) support is partial and depends on the specific toolkit of the app under test.

How do I install this for a desktop regression suite?

Three options. For direct TypeScript: npm install @mediar-ai/terminator and call desktop.findElement with selector/alternatives/fallback_selectors/retries parameters. For an AI coding assistant running your tests: claude mcp add terminator "npx -y terminator-mcp-agent@latest", then Claude Code or Cursor can drive any desktop app and will see the race behavior through the returned successful_selector and selectors_tried fields on every step. For Python: pip install terminator and use the Desktop class with the same argument shape.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.