UI automation testing that survives a layout shift
Most guides on UI automation testing assume your test runner stores coordinates and complains when those coordinates change. Terminator takes the opposite stance: before any two snapshots are compared, the element IDs and the bounds rectangles are stripped out. A Reply button that drifts 118 pixels down between runs produces zero diff. The test passes because nothing real changed.
The thing every UI test framework gets wrong
A UI test fails for one of two reasons. Either the product is broken and the assertion correctly catches it, or the product is fine and the assertion is too strict about something the user does not see. The second category is what makes UI automation testing the slowest, flakiest, most-disliked layer in any test pyramid.
Most of that pain is coordinate noise. A new banner pushes the form down 60 pixels. A skeleton loader holds the layout for 200ms and steals the viewport. A modal animates open and the snapshot lands half-rendered. The product is fine. The screenshot diff is red.
The fix in most frameworks is a long config of ignore regions, mask rectangles, animation freezing, and per-test viewport pinning. The fix in Terminator is one file, two regexes, and a snapshot format that has the coordinate field as an optional annotation rather than a load-bearing comparison key.
“Bounds-only changes should not produce a diff.”
crates/terminator/src/ui_tree_diff.rs, line 210, in test_simple_ui_tree_diff_yaml_bounds_change_no_diff
The actual code
Two regular expressions, run on the snapshot string before the line diff. That is the whole trick.
The first regex, #[\w\-]+, removes volatile element IDs like #12345 or #abc-def. Those IDs are assigned by the OS at element creation and reset on every relaunch, so they always change between runs. The second regex, bounds: \[[^\]]+\],?\s*, removes the coordinate rectangle bounds: [x,y,w,h] wherever it appears in the line. After both passes run, only the stable parts of the tree remain: role, name, value, focusability, and hierarchy.
What that looks like on a real snapshot
A comment thread in a real app. The container grows by 118 pixels between runs, so the Reply button shifts down by exactly that much. Watch what gets compared.
Snapshot diff with and without bounds stripping
The two snapshots below differ only in the bounds rectangles. A naive line diff would flag both lines as changed, fail the test, and route a developer to investigate. Run 1: - [Group] Comment from flappy-goose (bounds: [26,472,617,367], focusable) - [Button] Reply (bounds: [128,961,82,34], focusable) Run 2: - [Group] Comment from flappy-goose (bounds: [26,472,617,485], focusable) - [Button] Reply (bounds: [128,1079,82,34], focusable)
- Group height changes from 367 to 485
- Reply button y changes from 961 to 1079
- Naive diff: 2 lines added, 2 lines removed
- Test verdict: fail
Where layout-shift noise comes from
Almost everything that breaks a brittle UI test falls into one of these buckets. None of them describe a product regression. All of them are coordinate noise.
The pipeline, end to end
Every test run feeds raw accessibility-tree text from any source into one cleaning step, then into one differ. The output is either None (test passes) or a string of +/- lines.
ui_tree_diff.rs pipeline
What still gets caught
Stripping volatile fields does not mean stripping signal. Anything a user can perceive still ends up in the diff.
Diff still flags
- A button renamed from Submit to Send
- A checkbox losing its enabled state
- A list item that disappears between runs
- A new dialog appearing on top of the window
- A Text node whose value changes from Loading to Done
- A focusable element losing the focusable flag
- Tree shape changes (an extra parent or sibling)
Compared to a coordinate-aware test runner
| Feature | Pixel or DOM-bounds runners | Terminator |
|---|---|---|
| Stores absolute pixel positions | Yes | No, stripped before diff |
| Stores synthetic element IDs | Yes (or HTML id attribute) | No, stripped before diff |
| Compares role and name | Sometimes (DOM only) | Yes, on every node |
| Crosses app boundaries | No (browser-only) | Yes (Chrome, Slack, Excel in one selector) |
| Survives 118px button move | Test fails | Zero diff, test passes |
| Detects renamed Submit to Send | Yes (if selector targets text) | Yes, surfaces as +/- diff line |
A test you can write today
From the SDK, snapshot the accessibility tree before and after an action, then call the differ. The same code runs against Slack, Notion, Excel, or a Chrome tab.
The original test, in source
The 118-pixel claim is not marketing. It is a unit test that ships with the crate and fails the build if the behavior regresses.
From the test suite
- [Group] Comment from flappy-goose (bounds: [26,472,617,367], focusable)
- [Button] Reply (bounds: [128,961,82,34], focusable)
---
- [Group] Comment from flappy-goose (bounds: [26,472,617,485], focusable)
- [Button] Reply (bounds: [128,1079,82,34], focusable)Verify it yourself in 30 seconds
The repo is mediar-ai/terminator. Clone it and run the targeted test.
Vertical drift the Reply button absorbs in the unit test without producing a single diff line.
Regular expressions in the cleaning pass. One for #id, one for bounds:.
Unit tests in ui_tree_diff.rs that lock the behavior in place.
How to wire it into your test pipeline
The differ is plain Rust, exposed through the TypeScript and Python SDKs and the MCP server. Pick your harness; the cleaning step is identical.
Install the SDK or MCP server
One command, depending on your stack.
Snapshot before the action
Call locator(...).snapshotYaml() or the MCP tool capture_tree. Both return the compact YAML that ui_tree_diff.rs already knows how to clean.
Run the user action
Click, type, drag, scroll. The selector engine in selector.rs accepts boolean expressions like role:Button && name:Send and the chained form window:Slack >> role:Button.
Snapshot after the action
Same call, second time. You now have two compact-YAML strings.
Diff and assert
simple_ui_tree_diff(before, after) returns Ok(None) when only volatile fields differ. Treat None as pass and any returned string as the failure message.
Install
Three flavors. Same Rust core, different surface.
Want to run your flaky UI suite through the differ?
Book 20 minutes and we will pipe one of your accessibility-tree snapshots through ui_tree_diff.rs live.
Frequently asked questions
Why do most UI automation tests break when a button moves a few pixels?
Pixel-based runners compare screenshots and report any positional change as a regression. DOM and accessibility-tree runners that include coordinate fields in their snapshot comparison see the same moved bounds as a diff. Terminator's ui_tree_diff.rs strips both element IDs and the bounds: [x,y,w,h] tuple before line-diffing, so a button that drifts 118 pixels down produces no output. The test passes because nothing semantically changed.
What exactly does Terminator's ui_tree_diff.rs throw away before comparing?
Two regexes do the work. The first, ` #[\w\-]+`, removes volatile element IDs like #12345 or #abc-def-123. The second, `bounds: \[[^\]]+\],?\s*`, removes any bounds rectangle in the form bounds: [10,20,100,30]. The remaining structure (role, name, focusable flag, hierarchy) is what gets diffed line by line using the similar crate, which is the Rust analogue of Python's difflib.
Is this only for Windows or does it work on macOS too?
Terminator runs against Windows UI Automation and the macOS Accessibility API. The diff layer in ui_tree_diff.rs operates on serialized accessibility-tree text, so the same bounds-stripping behavior applies on both platforms. The selector engine in selector.rs is also platform-neutral.
How is this different from Playwright trace viewer or Cypress snapshot tests?
Playwright and Cypress operate on the browser DOM. Their snapshots are HTML or screenshots and are limited to what runs inside Chromium or WebKit. Terminator operates on the OS accessibility tree, so the same selectors and the same diff machinery cover Chrome, Excel, Slack, Photoshop, and any other native app the OS exposes via UIA or AX. There is no DOM, so there is no need to skip animations, debounce hover states, or pin the viewport.
What format does the snapshot use?
Two formats are supported, detected from the input. JSON snapshots get parsed and have the id and element_id fields removed via tree walk. Compact YAML snapshots (lines that look like '- [Button] Submit #id123 (bounds: [10,20,100,30], focusable)') get cleaned with the two regexes above. Both paths converge on the line-diff step, so you can choose the format your test pipeline already serializes.
Where can I read the actual code?
The differ lives at crates/terminator/src/ui_tree_diff.rs in mediar-ai/terminator on GitHub. The unit test test_simple_ui_tree_diff_yaml_bounds_change_no_diff at lines 202-212 specifically asserts that a Reply button moving from y:961 to y:1079 (118 pixels down) produces no diff. Run cargo test --package terminator ui_tree_diff to see it pass locally.
Does it ignore everything that changes? How will I catch real bugs?
Only volatile fields are stripped: synthetic IDs that change every run and absolute coordinate rectangles. Role, name, value, focusability, and tree shape all stay in the diff. So a button renamed from Submit to Send, a checkbox that loses its enabled state, or a missing list item all show up as added or removed lines prefixed with + or -. The test_simple_ui_tree_diff_yaml_with_changes test demonstrates this for both a window title swap and a button label swap.