Accessibility tree internals
A node with an id in the a11y tree
Browser automation taught everyone the same trick: stop clicking pixels, give every accessibility node a stable handle, and act on the handle. That handle is the "node with an id" you keep seeing in a11y-tree snapshots. The interesting part is that the trick is not specific to the browser. Below is where the id comes from, and how Terminator hands the same kind of id to an automation agent for every app on the desktop, not just the tab.
Direct answer · verified 2026-06-16
A node with an id in an accessibility tree is an element plus a stable handle so you can target it without pixel coordinates. In the browser that handle is the AXNodeId exposed by the Chrome DevTools Protocol Accessibility domain (or the ref in a Playwright snapshot). Terminator brings the same model to native desktop apps: each node's id is a 6-character string derived from a BLAKE3 hash of the element's UIA AutomationId, role, name, and class, computed in crates/terminator/src/platforms/windows/utils.rs.
Browser-side source of truth: Chrome DevTools Protocol, Accessibility domain.
Where the "node with an id" pattern comes from
Chrome builds an accessibility tree next to the DOM, the same one a screen reader consumes. The Chrome DevTools Protocol exposes it through the Accessibility domain: getFullAXTree and getPartialAXTree return AXNode objects, each carrying a nodeId, and getChildAXNodes walks the tree by that id. Every higher-level tool inherits this: Playwright, browser-use, and the various MCP accessibility bridges all hand a model a flattened snapshot where each interactive node has a short ref, then translate "act on ref e7" back into a real node.
The id is the contract. The part that decides what to do (a person, a test script, an LLM) names a node by its id. The part that does it re-finds that node in the live tree and clicks it. Neither side has to agree on where the node is on screen, only on which node it is. That decoupling is what made accessibility-tree automation more reliable than coordinate clicking, and it is the entire reason this question keeps coming up.
The browser's a11y tree stops at the edge of the tab
Here is the catch nobody mentions when they explain AXNodeId. The Chrome accessibility tree only contains the web page. The moment your workflow touches a native dialog, the file picker, a desktop app, or the OS chrome around the tab, the browser's tree ends and your node ids end with it. You are back to coordinates and screenshots for the part of the job that lives outside the page.
Windows and macOS publish accessibility trees too, for the exact same reason browsers do: screen readers need them. On Windows that tree is UI Automation (UIA); on macOS it is AX. Every native window, button, and field is already a node in it. The question is whether your automation engine surfaces those nodes with stable ids the way the browser does. Terminator does, with an API deliberately shaped like the browser one so the mental model carries over unchanged.
Same pattern, different scope
The id model is identical. What changes is how far the tree reaches.
| Feature | Browser a11y tree (CDP) | Terminator (OS a11y tree) |
|---|---|---|
| What is in the tree | The current web page only | Every native window and the page inside it |
| Node handle | AXNodeId / snapshot ref | 6-char id() plus a short index in snapshots |
| How the id is derived | Assigned by the engine per session | BLAKE3 hash of AutomationId + role + name + class |
| Target by app-defined id | Not exposed | nativeid: maps to the raw UIA AutomationId |
| Where it breaks | Native dialogs, file pickers, other apps | Resolves against whichever app is in focus |
If your automation never leaves the browser tab, the browser's own a11y tree is the right tool. Terminator is for when the workflow crosses into native apps.
How Terminator computes a node's id
This is the part you cannot read anywhere else, so here is the actual mechanism rather than a hand-wave. On Windows the id comes from generate_element_id() in crates/terminator/src/platforms/windows/utils.rs (the function starts at line 23). It does not assign a random handle. It builds an id from the content of the node, so the same logical element produces the same id on the next run.
generate_element_id(): the fallback chain
Collect stable properties
Read UIA AutomationId, control type (role), name, and class name. Empty values are dropped, and a Custom control type is treated as absent.
Concatenate, then BLAKE3 hash
Join the present properties into one string and hash it with BLAKE3. Take the first 8 bytes as a u64. id() returns the first 6 characters of that number's decimal form.
Fallback: bounding rectangle
If every stable property was empty, hash the element's left, top, width, and height instead. Less durable, still deterministic for a fixed layout.
Last resort: memory address
If even the bounds are missing, use the object's pointer. Unique within the session, not stable across sessions. This is the floor, not the norm.
The ordering matters. Because AutomationId comes first, a node the app developer explicitly named gets an id dominated by that name, which is the most stable input available. A node with only a role and a label still gets a content id; an anonymous, unbounded node still gets something usable. The id never throws, it just degrades, and you can read the degradation level off how the node was hashed.
// stable inputs, in order
let mut to_hash = String::new();
if let Some(id) = automation_id { to_hash.push_str(&id); }
if let Some(role) = role { to_hash.push_str(&role.to_string()); }
if let Some(n) = name { to_hash.push_str(&n); }
if let Some(cn) = class_name { to_hash.push_str(&cn); }
// fallbacks if no stable properties existed
if to_hash.is_empty() { /* hash the bounding rectangle */ }
if to_hash.is_empty() { /* last resort: object pointer */ }
let hash = blake3::hash(to_hash.as_bytes());
Ok(hash.as_bytes()[0..8]
.try_into()
.map(u64::from_le_bytes)
.unwrap() as usize)The public id() method then runs object_id().to_string().chars().take(6).collect(), so what you see on a node is the leading 6 characters of that hashed number.
What the model actually sees: indexed nodes, by source
A 6-character hash is great for logging and for the id: selector, but it is a clumsy thing to ask a language model to type back to you. So when Terminator's MCP server hands a tree to an agent, it prints a much shorter handle next to each node: a sequential index. Nodes that have on-screen bounds (the ones you can actually click) get a clickable index; nodes without bounds get a dash. The agent says "click 4", and the server maps 4 back to the node and its bounds.
When a window mixes sources, the index gets a one-letter prefix so the handle stays unambiguous. From crates/terminator/src/tree_formatter.rs, the ElementSource enum defines exactly five prefixes:
So a single "node with an id" can come from the accessibility tree (u), from the page DOM inside a browser window (d), or from a vision pass when an app exposes nothing structural at all. The agent treats them the same way, as an indexed handle to act on. That is the bridge: the browser's DOM nodes and the OS's accessibility nodes end up in one unified, indexed list.
u1 [Window] "Save As" (bounds: [220,140,640,480])
u2 [ComboBox] "File name" (bounds: [330,360,360,28])
u3 [Button] "Save" (bounds: [700,420,90,30])
- [Text] "Encoding:" (no bounds, not clickable)
d4 [link] "terms" (bounds: [380,300,52,18])
o5 [ocr] "Read only" (bounds: [340,398,70,16])Clickable nodes get a prefixed index; the un-bounded text node gets a dash. The agent acts on u3, not on a pixel.
Acting on a node by its id, in your own code
Outside the agent loop, you target nodes with selectors, and the id is one of the things you can select on. The API is Playwright-shaped, so if you have written browser automation the muscle memory transfers. You build a locator, the engine resolves it against the live accessibility tree, and you act.
Browser ref vs OS node id, same shape
// page accessibility tree only
const snapshot =
await page.accessibility.snapshot();
// act on a node from the snapshot
await page.getByRole("button",
{ name: "Save" }).click();How a node id resolves to a click
You name the node
Pass a selector: a role, a name, the # id shortcut, or nativeid: for the app's own automation id. This is the handle, not a position.
The engine walks the live tree
Terminator queries the OS accessibility API (UIA on Windows) and finds the node matching your selector right now, in the window that is in focus.
The id is recomputed and confirmed
generate_element_id() hashes the node's current properties, so the same logical element keeps the same id even if it moved on screen.
The action fires on the node
Click, type, set value, or read. No coordinates were hard-coded, so a moved or rescaled window does not break the step.
When the browser's own a11y tree is still the right answer
If your entire workflow lives inside a single web page and never touches a native dialog, the browser's accessibility tree is already perfect for the job. Playwright's role locators and the Chrome DevTools AXNodeId give you stable node handles with zero extra dependencies, and you should use them. The honest tradeoff: Terminator adds value precisely when the workflow leaves the tab. Native file pickers, installer wizards, line-of-business desktop apps, the OS chrome around the browser, anything where the page's tree simply does not reach. That is the seam where a node id from the OS tree starts mattering and a browser ref runs out.
Worth saying plainly: the deep id mechanism described here is the Windows UIA path, which is where Terminator's native-id support is most complete. If you are deciding between approaches, that is the context to weigh it in.
Mapping a workflow that keeps leaving the browser tab?
Tell us what app your automation has to reach into and we will tell you whether OS-level node ids are the right fix or overkill.
Questions people actually ask
What does "a node with an id in the a11y tree" actually mean?
The a11y tree (accessibility tree) is the structured view of an interface that the OS or browser already publishes for screen readers. Every element in it is a node: a button, a text field, a list item, a group. A node with an id is just that node plus a stable handle you can store and reuse, so automation can say "click node 184293" instead of "click at pixel 612, 388". In the browser this handle is the Chrome DevTools Protocol AXNodeId, or the ref that appears in a Playwright or MCP accessibility snapshot. In Terminator it is a 6-character string returned by the element's id() method.
Where does the AXNodeId come from in browser automation?
Chrome builds an accessibility tree alongside the DOM. The Chrome DevTools Protocol exposes it through the Accessibility domain: getFullAXTree and getPartialAXTree return AXNode objects, each with a nodeId, and getChildAXNodes walks the tree by id. Tools like Playwright, browser-use, and the various MCP accessibility bridges sit on top of this, hand the model a flattened snapshot where each interactive node has a short ref, and then translate "act on ref e7" back into a real node. The id is the contract between the model that decides and the engine that clicks.
How does Terminator compute a node's id for a native desktop app?
On Windows it calls generate_element_id() in crates/terminator/src/platforms/windows/utils.rs. It concatenates the element's stable properties in order (the UIA AutomationId, then the control type, then the name, then the class name), hashes that string with BLAKE3, and takes the first 8 bytes as a u64. The public id() method then returns the first 6 characters of that number's decimal form. Because the input is content, not position, the same logical element produces the same id across runs as long as those properties are unchanged.
What happens to the id if a node has no stable properties?
generate_element_id() degrades on purpose. If the AutomationId, control type, name, and class are all empty, it falls back to hashing the element's bounding rectangle (left, top, width, height). If even the bounds are unavailable, it uses the object's memory address as a last resort, which is unique within a session but not stable across sessions. So a well-labelled element gets a content-derived id you can hard-code; an anonymous one still gets a usable id, just a less durable one. The fallback chain lives at lines 62 to 81 of that file.
How is this different from just using coordinates or a screenshot?
Coordinates break the moment a window moves, a DPI scale changes, or a layout reflows. A screenshot-plus-vision approach has to re-locate the element every single step and pays model latency for it. A node id is resolved against the live tree, so the engine re-finds the element structurally each time and the id stays meaningful even when the pixels move. That is the whole reason browser automation moved to accessibility refs, and it is the same reason desktop automation should.
What is the difference between the id, the index, and a selector in Terminator?
Three different handles for three different jobs. The id (from id()) is a content hash, good for logging and for the id: selector. The index is the short numeric handle (#1, #2, or u1, d1 when sources are mixed) that the MCP server prints next to each clickable node so an LLM can say "click u4" in one token. A selector (role:Button, name:Save, nativeid:submitBtn, or the # shortcut) is a structural query that re-resolves to a live node every time you run it. You target by selector in code; the model targets by index in a snapshot; the id ties a node back to its source.
Does Terminator expose the raw UIA AutomationId too?
Yes. The hashed id() is for stability across the tree, but if you already know the app's real automation id you can match it directly with the nativeid: selector, which maps to the Windows UIA AutomationId. So nativeid:saveButton targets the element the app developer explicitly named, while id:1a2b3c targets a node by Terminator's computed hash. Both end up resolving against the same accessibility tree.
Keep reading
RPA accessibility tree selectors: the actual grammar
The full selector language that resolves a node id back to a live element, with operator precedence and source line numbers.
The accessibility API for computer-use agents
Why agents that drive native apps through the a11y tree beat screenshot-and-click loops.
Why accessibility APIs beat OCR and pixel matching
The latency and stability case for structural node lookups over vision.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.