UI automation testing that survives a layout shift

Most guides on UI automation testing assume your test runner stores coordinates and complains when those coordinates change. Terminator takes the opposite stance: before any two snapshots are compared, the element IDs and the bounds rectangles are stripped out. A Reply button that drifts 118 pixels down between runs produces zero diff. The test passes because nothing real changed.

M
Matthew Diakonov
8 min read
4.9from dozens of design partners
Single Rust file: crates/terminator/src/ui_tree_diff.rs
Two regexes strip every volatile field before diffing
Same selector covers Chrome, Slack, Excel in one test

The thing every UI test framework gets wrong

A UI test fails for one of two reasons. Either the product is broken and the assertion correctly catches it, or the product is fine and the assertion is too strict about something the user does not see. The second category is what makes UI automation testing the slowest, flakiest, most-disliked layer in any test pyramid.

Most of that pain is coordinate noise. A new banner pushes the form down 60 pixels. A skeleton loader holds the layout for 200ms and steals the viewport. A modal animates open and the snapshot lands half-rendered. The product is fine. The screenshot diff is red.

The fix in most frameworks is a long config of ignore regions, mask rectangles, animation freezing, and per-test viewport pinning. The fix in Terminator is one file, two regexes, and a snapshot format that has the coordinate field as an optional annotation rather than a load-bearing comparison key.

0 diff

Bounds-only changes should not produce a diff.

crates/terminator/src/ui_tree_diff.rs, line 210, in test_simple_ui_tree_diff_yaml_bounds_change_no_diff

The actual code

Two regular expressions, run on the snapshot string before the line diff. That is the whole trick.

ui_tree_diff.rs

The first regex, #[\w\-]+, removes volatile element IDs like #12345 or #abc-def. Those IDs are assigned by the OS at element creation and reset on every relaunch, so they always change between runs. The second regex, bounds: \[[^\]]+\],?\s*, removes the coordinate rectangle bounds: [x,y,w,h] wherever it appears in the line. After both passes run, only the stable parts of the tree remain: role, name, value, focusability, and hierarchy.

What that looks like on a real snapshot

A comment thread in a real app. The container grows by 118 pixels between runs, so the Reply button shifts down by exactly that much. Watch what gets compared.

Snapshot diff with and without bounds stripping

The two snapshots below differ only in the bounds rectangles. A naive line diff would flag both lines as changed, fail the test, and route a developer to investigate. Run 1: - [Group] Comment from flappy-goose (bounds: [26,472,617,367], focusable) - [Button] Reply (bounds: [128,961,82,34], focusable) Run 2: - [Group] Comment from flappy-goose (bounds: [26,472,617,485], focusable) - [Button] Reply (bounds: [128,1079,82,34], focusable)

  • Group height changes from 367 to 485
  • Reply button y changes from 961 to 1079
  • Naive diff: 2 lines added, 2 lines removed
  • Test verdict: fail

Where layout-shift noise comes from

Almost everything that breaks a brittle UI test falls into one of these buckets. None of them describe a product regression. All of them are coordinate noise.

synthetic idsabsolute coordinatesz-index reflowresponsive breakpointsscroll positionmodal animationsad slotsdynamic class hashesreact keyslive region updates

The pipeline, end to end

Every test run feeds raw accessibility-tree text from any source into one cleaning step, then into one differ. The output is either None (test passes) or a string of +/- lines.

ui_tree_diff.rs pipeline

Windows UIA
macOS AX
Workflow recorder
MCP server
ui_tree_diff.rs
None
+ added line
- removed line

What still gets caught

Stripping volatile fields does not mean stripping signal. Anything a user can perceive still ends up in the diff.

Diff still flags

  • A button renamed from Submit to Send
  • A checkbox losing its enabled state
  • A list item that disappears between runs
  • A new dialog appearing on top of the window
  • A Text node whose value changes from Loading to Done
  • A focusable element losing the focusable flag
  • Tree shape changes (an extra parent or sibling)

Compared to a coordinate-aware test runner

FeaturePixel or DOM-bounds runnersTerminator
Stores absolute pixel positionsYesNo, stripped before diff
Stores synthetic element IDsYes (or HTML id attribute)No, stripped before diff
Compares role and nameSometimes (DOM only)Yes, on every node
Crosses app boundariesNo (browser-only)Yes (Chrome, Slack, Excel in one selector)
Survives 118px button moveTest failsZero diff, test passes
Detects renamed Submit to SendYes (if selector targets text)Yes, surfaces as +/- diff line

A test you can write today

From the SDK, snapshot the accessibility tree before and after an action, then call the differ. The same code runs against Slack, Notion, Excel, or a Chrome tab.

send-message.test.ts

The original test, in source

The 118-pixel claim is not marketing. It is a unit test that ships with the crate and fails the build if the behavior regresses.

From the test suite

- [Group] Comment from flappy-goose (bounds: [26,472,617,367], focusable)
  - [Button] Reply (bounds: [128,961,82,34], focusable)
---
- [Group] Comment from flappy-goose (bounds: [26,472,617,485], focusable)
  - [Button] Reply (bounds: [128,1079,82,34], focusable)
-900% lines of UI noise eliminated

Verify it yourself in 30 seconds

The repo is mediar-ai/terminator. Clone it and run the targeted test.

zsh
0px

Vertical drift the Reply button absorbs in the unit test without producing a single diff line.

0

Regular expressions in the cleaning pass. One for #id, one for bounds:.

0

Unit tests in ui_tree_diff.rs that lock the behavior in place.

How to wire it into your test pipeline

The differ is plain Rust, exposed through the TypeScript and Python SDKs and the MCP server. Pick your harness; the cleaning step is identical.

1

Install the SDK or MCP server

One command, depending on your stack.

2

Snapshot before the action

Call locator(...).snapshotYaml() or the MCP tool capture_tree. Both return the compact YAML that ui_tree_diff.rs already knows how to clean.

3

Run the user action

Click, type, drag, scroll. The selector engine in selector.rs accepts boolean expressions like role:Button && name:Send and the chained form window:Slack >> role:Button.

4

Snapshot after the action

Same call, second time. You now have two compact-YAML strings.

5

Diff and assert

simple_ui_tree_diff(before, after) returns Ok(None) when only volatile fields differ. Treat None as pass and any returned string as the failure message.

Install

Three flavors. Same Rust core, different surface.

install

Want to run your flaky UI suite through the differ?

Book 20 minutes and we will pipe one of your accessibility-tree snapshots through ui_tree_diff.rs live.

Frequently asked questions

Why do most UI automation tests break when a button moves a few pixels?

Pixel-based runners compare screenshots and report any positional change as a regression. DOM and accessibility-tree runners that include coordinate fields in their snapshot comparison see the same moved bounds as a diff. Terminator's ui_tree_diff.rs strips both element IDs and the bounds: [x,y,w,h] tuple before line-diffing, so a button that drifts 118 pixels down produces no output. The test passes because nothing semantically changed.

What exactly does Terminator's ui_tree_diff.rs throw away before comparing?

Two regexes do the work. The first, ` #[\w\-]+`, removes volatile element IDs like #12345 or #abc-def-123. The second, `bounds: \[[^\]]+\],?\s*`, removes any bounds rectangle in the form bounds: [10,20,100,30]. The remaining structure (role, name, focusable flag, hierarchy) is what gets diffed line by line using the similar crate, which is the Rust analogue of Python's difflib.

Is this only for Windows or does it work on macOS too?

Terminator runs against Windows UI Automation and the macOS Accessibility API. The diff layer in ui_tree_diff.rs operates on serialized accessibility-tree text, so the same bounds-stripping behavior applies on both platforms. The selector engine in selector.rs is also platform-neutral.

How is this different from Playwright trace viewer or Cypress snapshot tests?

Playwright and Cypress operate on the browser DOM. Their snapshots are HTML or screenshots and are limited to what runs inside Chromium or WebKit. Terminator operates on the OS accessibility tree, so the same selectors and the same diff machinery cover Chrome, Excel, Slack, Photoshop, and any other native app the OS exposes via UIA or AX. There is no DOM, so there is no need to skip animations, debounce hover states, or pin the viewport.

What format does the snapshot use?

Two formats are supported, detected from the input. JSON snapshots get parsed and have the id and element_id fields removed via tree walk. Compact YAML snapshots (lines that look like '- [Button] Submit #id123 (bounds: [10,20,100,30], focusable)') get cleaned with the two regexes above. Both paths converge on the line-diff step, so you can choose the format your test pipeline already serializes.

Where can I read the actual code?

The differ lives at crates/terminator/src/ui_tree_diff.rs in mediar-ai/terminator on GitHub. The unit test test_simple_ui_tree_diff_yaml_bounds_change_no_diff at lines 202-212 specifically asserts that a Reply button moving from y:961 to y:1079 (118 pixels down) produces no diff. Run cargo test --package terminator ui_tree_diff to see it pass locally.

Does it ignore everything that changes? How will I catch real bugs?

Only volatile fields are stripped: synthetic IDs that change every run and absolute coordinate rectangles. Role, name, value, focusability, and tree shape all stay in the diff. So a button renamed from Submit to Send, a checkbox that loses its enabled state, or a missing list item all show up as added or removed lines prefixed with + or -. The test_simple_ui_tree_diff_yaml_with_changes test demonstrates this for both a window title swap and a button label swap.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.