Test automation UI: stop the flake at record time
Every other guide on "test automation ui" treats flakes as a replay-time problem and sells you the cure: auto-wait, retries, AI self-healing locators, visual fallback. Useful. Downstream. This page covers the move you can make one step earlier. Terminator's workflow recorder refuses to serialize the three classes of ephemeral data that create most UI-test flakes (relative timestamps, Windows COM null dressings, browser plumbing panes) into the selector string. The selector you record is the selector that still matches six months from now.
The filter set, straight from the source
This is the actual static HashSet the recorder loads once at startup. Every captured Name, Text, and Value field is lowercased, trimmed, and matched against it. If a field is in the set, serde's skip_serializing_if drops the field from the recorded event. The selector builder then has nothing to encode, so the selector falls back to role or native ID, both stable.
Ten patterns that rot a selector overnight
Slack, Teams, Gmail, Outlook, Notion, Linear: every modern app renders timestamps relative to now. "Posted 5 mins ago" is convenient for humans and lethal for a recorded selector. Six minutes later the match is gone. Terminator ships a dedicated, grep-able filter that catches all ten common dressings of relative time and refuses to encode any of them into the selector name: attribute.
Raw accessibility tree in, stable selector out
The recorder sits between the noisy accessibility tree and the clean selector that lands in your test file. Three filter layers run in sequence. What comes out the other side is deterministic: same click, same tree, same selector, every run.
Three filters, one clean selector
What the raw tree actually looks like
Here is a real UIA dump shape from Chrome, the kind you see in Accessibility Insights. Eight nodes between the window and the button you clicked. Six of them are ephemeral. A naive recorder walks this tree top to bottom and encodes all eight.
What the recorder emits after filtering
Same click. Same tree. The three filter layers collapse the chain from eight nodes down to three. Every remaining node is a property that an app is contractually obligated to expose consistently across runs (executable name, page title, aria label).
Values the recorder will never write into a selector
Every one of these appears as a literal substring in events.rs. Lowercased, trimmed, and pattern-matched on every captured Name, Text, and Value. Any field containing one of these is dropped from the emitted selector.
The parent-chain filter, walked level by level
This is the logic the recorder applies to every ancestor when building the selector chain. It is not a one-shot check on the leaf; each grandparent, great-grandparent, and window-level node re-runs the same four predicates. An ancestor whose name was a timestamp an hour ago would silently break replay even if the leaf itself is fine.
Six filter layers, one card each
Every rule below is in one file: crates/terminator-workflow-recorder/src/events.rs. No YAML config, no hidden heuristics. Each layer runs before the selector is stringified, so none of the rot below ever lands in your recorded test file.
NULL_LIKE_VALUES (~20 patterns)
The constant HashSet compiled once on startup. Every captured Name, Text, Value, and hierarchy field is lowercased, trimmed, and checked against the set. If it hits, the field serializes as None via serde's skip_serializing_if. Includes both generic placeholders (null, undefined, n/a) and Windows-specific COM output (bstr(), variant(), variant(empty)).
contains_relative_time (10 patterns)
Pure string check. ' ago', 'just now', 'yesterday', 'today', 'last week', 'last month', and four suffixes for minute/hour shorthand (' min', ' mins', ' hr', ' hrs'). Applied to leaf elements and to every ancestor during chain construction.
Browser implementation-detail filter
Lowercased substring match against 'legacy window', 'chrome_widgetwin', 'intermediate d3d window'. These are real Win32 parent windows Chromium ships but the user never sees them. Dropping them halves the depth of most Chrome selectors.
Redundant window-title pane filter
If the parent role is Pane and the name ends with ' - google chrome', ' - mozilla firefox', or ' - microsoft edge', it is dropped. The process: prefix already pins which browser is running; including the tab title twice in the chain is pure noise.
is_empty_string (serde guard)
Used on 30+ fields across KeyboardEvent, ClipboardEvent, HotkeyEvent, DragDropEvent. Whitespace-only strings, empty BSTR wrappers, and the full NULL_LIKE_VALUES set all resolve to None in the recorded JSON. No phantom empty strings leak into the replay.
Tested with 60+ assertions
events.rs ships a dedicated test block at line 1765+ with explicit cases for 'BSTR()', 'VARIANT(EMPTY)', 'Variant(Empty)' mixed case, ' null ' trimmed, 'nullify' not matching (substring protection), 'unknown value' kept. Regression-proof.
Same click. Different recorders. Different half-life.
// The selector a naive recorder emits. Captured on 2026-04-20.
// Every level of this chain is a tripwire for the next replay.
await desktop.locator(
'window:Gmail - Google Chrome' + // title changes on tab switch
' >> pane:Chrome_WidgetWin_1' + // suffix increments per window
' >> pane:Intermediate D3D Window' + // gone when GPU accel is off
' >> pane:Gmail - Google Chrome' + // duplicate of the window
' >> document:Gmail' +
' >> group:Emails, 5 mins ago' + // literally wrong in 6 min
' >> group:variant()' + // COM placeholder, not a name
' >> button:Archive'
).first(5000);
// Six of the eight nodes in this chain are guaranteed to rot.
// The test will pass in CI today. It will fail at 3am on Tuesday
// and the error will say "ElementNotFound", not "this recorder
// encoded ephemeral state five selectors ago".What happens between the click and the emitted selector
Five steps, in order, inside the recorder. Every one is a pure function of the accessibility tree at that moment. There is no model, no prompt, no "heal later" placeholder. The output is a deterministic selector or nothing.
User clicks a button
Windows UIA fires a raw focus or invoke event. The recorder picks up the event and resolves the element it fired on. At this point the tree is 'dirty': it contains every internal pane, every timestamp, every unpopulated COM field.
Strip the leaf
Before any selector emission, the leaf's Name, Value, and Text fields run through is_empty_string. Anything in NULL_LIKE_VALUES becomes None. Anything containing a relative-time pattern on the text field marks the whole element as not-encodable (the recorder uses a role-only selector instead).
Walk the parent chain
build_parent_hierarchy walks up from the leaf to the first Window role. At each step it checks is_internal_element: browser plumbing, redundant window-title panes, and any relative-time string cause the parent to be skipped (but the walk continues up).
Build the chained selector
build_chained_selector joins the surviving ancestors with >>. It skips the first Window role because process: already targets that window. Result: a 3 to 4 element chain where every node is stable across runs.
Emit as TypeScript or YAML workflow
The clean selector is written to the recorded workflow file. Replaying a month later against a new Chromium version, new DirectComposition behavior, new tab titles still hits the same leaf because nothing ephemeral was encoded.
Record-time filtering vs run-time self-healing
Both approaches address UI test flake. One addresses it before the selector exists; the other after it has failed. They are complementary, but the upstream move buys you more.
| Feature | AI self-healing (run-time) | Terminator recorder |
|---|---|---|
| Where flake gets addressed | At replay time: auto-wait, retries, AI self-healing, locator rewrites after N failures | At record time: ephemeral content is filtered out of the accessibility tree before the selector string is constructed |
| Relative timestamps in selectors | Encoded as literal text, then replaced by an LLM when the match fails | contains_relative_time() filters 10 patterns: ' ago', 'just now', 'yesterday', 'today', 'last week', 'last month', ' min', ' mins', ' hr', ' hrs' |
| Windows COM placeholder strings | Not addressed. 'bstr()' and 'variant()' look like valid Name attributes to a generic recorder | NULL_LIKE_VALUES set with ~20 entries including 'bstr()', 'variant()', 'variant(empty)', '<unknown>', '(null)' |
| Browser chrome artifacts | 'Chrome_WidgetWin_1', 'Intermediate D3D Window' and the duplicate window-title Pane all get written into the selector chain | All three are matched by explicit substrings during parent hierarchy walk; the node is kept in the tree but dropped from the chain |
| Selector depth | As deep as the accessibility tree (often 8-12 for Electron apps) | Typically 3-4 after filtering: process + window or document + role+name leaf |
| Replay-month-later success rate | Degrades with time as timestamps drift and Chromium internals churn | Flat. Nothing in the emitted selector decays on a fixed clock |
| Where you find the rule | In a vendor's self-healing heuristics, usually closed source | crates/terminator-workflow-recorder/src/events.rs, lines 8 to 81 and 696 to 716. Grep in a fresh clone |
“Every pattern, line number, and filter function on this page is grep-able in a fresh clone of mediar-ai/terminator. The NULL_LIKE_VALUES HashSet, the contains_relative_time function, and the parent hierarchy filter are all in events.rs between lines 8 and 716.”
github.com/mediar-ai/terminator
Concrete things that never enter a recorded selector
Read this as the inverse of a test-flake postmortem. Each bullet is a class of string that, in a less careful recorder, lands in your selector, passes once, and times out three weeks later. None of them reach the emit step in Terminator.
Filtered at record time
- Button names like 'variant()' or 'bstr()' from apps that expose raw COM VARIANTs
- Group names like 'Emails, 5 mins ago' or 'Posted yesterday' that rewrite on every load
- Windows panes named 'Chrome_WidgetWin_1' (suffix increments per window opened this session)
- Panes named 'Intermediate D3D Window' that only exist when DirectComposition is on
- Legacy Win32 stub windows that Chromium keeps for accessibility fallback
- Panes that duplicate the window title ('Gmail - Google Chrome' appearing twice in the chain)
- Anything matching ' - google chrome', ' - mozilla firefox', ' - microsoft edge' at end of a Pane name
- Whitespace-only Name fields, <empty>, (unknown), N/A, and 17 other null dressings
- Name fields ending in ' min', ' mins', ' hr', ' hrs' (common in Slack, Teams, Gmail timestamps)
- Relative-time strings at every level of the ancestor chain, re-checked per parent
Total ephemeral-content patterns the recorder refuses to emit
About 20 in NULL_LIKE_VALUES (null-dressings and COM placeholders), 10 in contains_relative_time, plus the three browser-chrome substrings. All defined as plain Rust constants in one file.
Why this is the missing angle on "test automation ui"
The current consensus on test automation UI is that flakes are a fact of life and the best you can do is automate around them: retries, waits, AI-generated locator rewrites, visual fallbacks. Every one of those is a product someone can sell you, which is why every article reads the same way. It is also why nobody writes about the other path: refusing to encode the flake sources in the first place.
Terminator is a developer framework, not a consumer recorder. The filter set lives as plain Rust constants in one file. You can read them, grep them, fork them, add your own patterns for a proprietary app. There is no vendor-locked "stability engine"between you and your selector. There is just a small, deterministic function and the thirty patterns it refuses to serialize.
If you already have a test automation UI stack built on self-healing, keep it. Add Terminator to the record side. The downstream heuristics get a cleaner input to start from, and the easiest flakes never need healing at all.
Have a recorded UI test suite that quietly rots?
Walk us through one flaky selector and we will trace which filter layer would have dropped it at record time.
Frequently asked questions
What do the top SERP results on 'test automation ui' miss?
They converge on one shape: a run-time flake-mitigation playbook. Auto-wait, retries, AI self-healing locators, visual regression fallback, capping UI tests at 5-10% of the suite. All of that is fine advice. None of it answers an earlier question: why is the flake in the selector to begin with? This page covers the upstream move. Before any wait or heal logic runs, Terminator's workflow recorder refuses to serialize flake sources into the selector string. Relative timestamps, Windows COM placeholders, and browser-plumbing panes never enter the chain, so nothing downstream has to detect and patch them.
Where is this filtering logic defined, exactly?
crates/terminator-workflow-recorder/src/events.rs in the mediar-ai/terminator repository. Lines 8 through 36 define NULL_LIKE_VALUES, the LazyLock HashSet of about 20 null-dressing patterns including Windows-specific bstr() and variant(). Lines 39 through 65 define is_empty_string, the serde skip predicate that runs on 30+ fields. Lines 67 through 81 define contains_relative_time, the 10-pattern string check. Lines 696 through 716 define is_internal_element inside the parent hierarchy walker, which drops chrome_widgetwin, intermediate d3d window, legacy window, and the three browser-title Panes. Every rule has an accompanying assertion in the test block starting around line 1765.
Why can't a recorder just emit raw coordinates or an XPath and be done?
Both rot faster than accessibility selectors do. Coordinates break at the first window resize, DPI change, monitor swap, or animated sidebar toggle. XPaths over the raw UIA tree break as soon as Chromium ships a new DirectComposition layer or reorders its shadow windows (it does, almost every major release). Accessibility tree selectors made of process + role + name are the most stable encoding available on Windows, but only if the Name field itself is stable. That is what the null-like and relative-time filters protect: the stability of the one attribute the selector actually locks onto.
What is a 'variant()' or 'bstr()' value and why would it ever be in a Name field?
Windows UIA exposes element attributes as VARIANTs, a COM container type that can hold strings (as BSTRs), integers, booleans, etc. When an app registers a UIA property but never populates it, the UIA client sees an empty VARIANT. Depending on the language binding, that serializes as the literal string 'variant()', 'variant(empty)', or 'bstr()'. A naive recorder writes these into the selector as if they were human-readable names. The match succeeds once (because the unpopulated VARIANT is reproducibly empty), then fails silently the moment the app fills the field with a real value. NULL_LIKE_VALUES catches all three dressings in lowercase, case-insensitively.
Does run-time retry and wait still matter if the recorder is this careful?
Yes, both systems work together. Terminator ships a Playwright-shaped wait_for primitive with four conditions (Exists, Visible, Enabled, Focused) and an AutomationError enum with 18 typed variants for ElementDetached, ElementObscured, ElementNotStable, and the other real-world flake modes. Those handle the flakes that no recorder could have predicted: a modal animating in, a dropdown obscuring the target, an async load still in flight. The recorder's job is to not create preventable flakes. The runtime's job is to absorb the unpreventable ones.
Does this only apply to Chrome? What about Slack, Excel, or a custom app?
The browser-chrome filter (chrome_widgetwin, intermediate d3d window, legacy window, the three ' - browser name' Pane suffixes) is Chromium-specific because Chromium is the one app family that reliably ships accessibility-invisible parent containers. The null-like filter and relative-time filter are universal. Slack, Teams, and Gmail all embed timestamps like '5 mins ago' inside group names; the filter catches all of them. Excel and Outlook expose a large number of UIA nodes with unpopulated COM VARIANTs; NULL_LIKE_VALUES catches those. Any Electron app effectively gets both the browser-chrome filter (because it is Chromium under the hood) and the universal filters.
How is this different from self-healing locators that the SERP pages describe?
Self-healing runs after a failure. An AI model looks at the old locator, the current DOM, and a prior screenshot, then proposes a new locator. It is reactive and probabilistic; each re-heal slightly redefines what the test is testing. The recorder filter is preventive and deterministic. Given the same accessibility tree, the same 30 patterns get dropped and the same selector gets emitted. You can read the rules, grep them, fork them. There is no model drift, no opaque reasoning, no quiet behavior change from one agent version to the next. Both approaches can coexist: the recorder gives you the cleanest possible starting selector, and self-heal is your fallback for the remaining, genuinely irreducible flakes.
Can I add my own patterns for a proprietary app?
Yes. NULL_LIKE_VALUES is a LazyLock HashSet initialized from a fixed array in the source; patching the array and rebuilding is a one-line change. contains_relative_time is a plain Rust function with no external config, so adding project-specific cases ('submitted last quarter', 'this sprint', internal ticket ID shorthand) is straightforward. If you do not want to fork, the recorder also exposes a pre-emit hook (in progress) that lets you reject or rewrite names without recompiling the crate. Watch the workflow-recorder changelog.
What happens to an element whose Name is entirely null-like? Is the whole event dropped?
No. The element is kept in the recording but the null-like field serializes as None (via serde's skip_serializing_if). Selector construction then falls back to role and position, or to the element's nativeid if one is exposed. The recorded event still lands in the workflow file; only the poisoned attribute is suppressed. This matters because many UIA apps expose genuine information in the ClassName or AutomationId but put junk in Name. Dropping the event entirely would lose automatable state; dropping only the bad attribute preserves it.
Does this apply to browser tests too, or only desktop tests?
It applies wherever the Terminator recorder is driving, which includes browser-rooted recordings (via Terminator's Chrome extension bridge) and pure desktop recordings. Inside a browser tab, the universal filters still fire against ARIA labels that sometimes leak '2 min ago' from client-side components. The browser-chrome filter additionally removes the Chromium shell Panes that sit between a document and the OS window. The result is a selector chain that walks directly from process:chrome.exe into the document role and then to the leaf, skipping everything in between.