MCP dev tools for desktop accessibility: the inspect/highlight/state loop, by tool name

Most guides on this topic describe the accessibility tree as a passive concept and stop. The interesting question is what dev tools an MCP server actually exposes against it: how an LLM in your editor can inspect a native window the way a frontend developer inspects a React app in Chrome DevTools. Terminator's MCP server ships that loop as five named tools. This is each one, with the source line and an example call.

Matthew Diakonov, Written with AI

Published May 7, 202610 min read

Direct answer (verified 2026-05-07)

The Terminator MCP server (npx -y terminator-mcp-agent) exposes a Chrome-DevTools-shaped toolbox for the desktop accessibility tree:

get_window_tree returns the accessibility tree as JSON, and with show_overlay:"ui_tree" draws labeled rectangles directly on the user's screen for every element, with seven label modes.
highlight_element flashes a colored border (and optional text label) around any selector match for a configurable duration.
validate_element resolves a selector once and returns the matched element's attributes plus which selector won.
wait_for_element polls a selector until a state condition (exists, visible, enabled, focused) holds.
stop_highlighting removes the active overlay window and drains highlight handles.

All five live in one Rust dispatch table at crates/terminator-mcp-agent/src/server.rs near line 9953. Source for the on-screen overlay: crates/terminator/src/platforms/windows/inspect_overlay.rs.

Why a separate dev-tools loop, instead of just "the tree"

Reading the accessibility tree is the easy part. Every desktop framework has had a way to do that since UI Automation shipped in Windows Vista. The hard part is what a frontend developer takes for granted in the browser: pointing at one specific node, watching its state change, asking the OS "is this thing actually clickable right now," and then ungating the next step on the answer. That loop is what makes Chrome DevTools usable. Without it, the agent has to guess.

An MCP server that only returns the tree puts the burden of that loop on the LLM, which is the slowest, most expensive component in the stack. An MCP server that exposes the loop as named tools lets the LLM pay one tool call per state transition, and lets the developer reading the run log see exactly what was visible to the agent at each step.

The five tools, in order of use

1
Read the tree
get_window_tree returns the accessibility tree of one process as JSON.
2
Show the overlay
show_overlay:"ui_tree" draws labeled rectangles over every element on screen.
3
Highlight one match
highlight_element flashes a colored border around the resolved selector.
4
Check state
validate_element returns visible/enabled/focused; wait_for_element polls until the condition holds.
5
Clear
stop_highlighting removes the overlay and drains active highlights.

1. get_window_tree, with show_overlay

The entry point. Pass process to scope to one app, optionally tree_max_depth to bound traversal cost, and show_overlay:"ui_tree" to also draw the inspector overlay on screen. The overlay branch is gated to Windows at server.rs line 1717 and routes through show_inspect_overlay() in crates/terminator/src/platforms/windows/inspect_overlay.rs, which builds a layered Win32 window with WS_EX_LAYERED | WS_EX_TRANSPARENT | WS_EX_TOPMOST so it sits over the target app without intercepting clicks, then paints rectangles and labels with Rectangle and DrawTextW.

Pick the label style with overlay_display_mode. The seven values come from the OverlayDisplayMode enum at crates/terminator/src/platforms/mod.rs lines 37 to 53.

The seven OverlayDisplayMode values

Rectangles — boxes only, no labels (use to see the layout partition of a window)
Index — [N], the same N you pass to nth: in a selector
Role — [Button], [Edit], [Window], etc.
IndexRole — [N:Role], the default for inspector workflows
Name — the accessible name a screen reader would read (use for localization triage)
IndexName — [N:Name]
Full — [N:Role:Name], densest label, best for vision-model screenshots

2. highlight_element, after the selector resolves

Once the agent has a specific selector, highlight_element (server.rs line 5600) flashes a colored border and an optional text label around it. The color is a u32 ARGB; the duration is milliseconds and defaults to 1000. A tokio::spawn at line 5717 schedules cleanup so highlights do not leak past their duration. On Windows the label rendering is configurable through font_size, font_bold, and font_color (lines 5651 to 5658).

The reason this is separate from the inspector overlay: the overlay shows every element so the agent can pick. The highlight shows one element so the agent (and a human watching the run) can confirm.

3. validate_element, the one-shot state read

validate_element runs the same selector resolution machinery as a click (server.rs line 5477), but with a no-op action: the only thing that matters is whether the element was found. It returns the resolved element's role, name, AutomationId, bounds, process_id, plus selector_used (which of your alternatives won) and selectors_tried (every selector it tested). Pass include_tree_after_action:true to get the resulting subtree without re-issuing get_window_tree.

4. wait_for_element, the conditioned poll

Different from validate_element because the property being watched can change without the tree shape changing: a Submit button can become enabled in place after a network call resolves, a Save dialog can become focused. The four valid conditions, parsed at server.rs line 5990, are exists, visible, enabled, focused. The exists path short-circuits to the standard locator wait at line 5838; the other three poll the live tree until the condition holds or the timeout fires.

This is the difference between an automation that hangs after a click and one that proceeds the moment the next control becomes interactive. Use enabled as the gate before a Submit click, and you stop racing the UI.

5. stop_highlighting, the cleanup

The overlay window and highlight rectangles persist by design: the InspectOverlayHandle::Drop impl at inspect_overlay.rs line 60 explicitly does not auto-close. This is because the language bindings (Python, Node) and the MCP server return a result and discard the handle; if Drop closed the window, the user would never see the labels. stop_highlighting (server.rs line 7458) is the explicit cleanup. Call it before the workflow ends, or at the start of the next inspection pass, otherwise you will be staring at the previous run's overlay.

One full pass: open Notepad, inspect, highlight, validate, clear

5 tools

“An MCP server for the OS accessibility tree, with the inspect/highlight/state loop a developer would actually use. Read it on GitHub.”

github.com/mediar-ai/terminator

What it looks like in your editor

From a Claude Code session against a fresh install. The first command registers the MCP server; the rest are tool calls the agent makes for you when you describe the inspection in English.

claude code session, mcp dev tools loop

Cross-platform: Windows vs macOS today

The MCP tool surface is identical across both platforms. Calls dispatch to a desktop trait that has Windows (UI Automation) and macOS (AX) implementations. The on-screen inspector overlay (show_inspect_overlay) is currently Windows-only because it uses Win32 layered windows; the cfg gates are at server.rs line 1717. On macOS, get_window_tree returns the JSON tree (read from AXUIElementCopyAttributeValue calls underneath), but the overlay is a no-op. highlight_element on macOS uses the AX-side highlight path. validate_element and wait_for_element are full parity.

Most of our users build for Windows first because the line-of-business apps that resist browser-side automation are Windows-native. macOS support exists for cross-platform agents and for our own development on a Mac.

Why this matters more than another "list of MCP tools"

The reason browser MCP servers (Playwright MCP, Chrome DevTools MCP) work is not that they expose ten tools. It is that the ten tools are shaped like a developer's actual workflow against the DOM: open a page, inspect, highlight a node, watch its state, act, verify. The same shape is missing from most desktop automation MCPs, which expose either "take a screenshot and click x,y" (the computer-use shape) or "here is the tree, you figure it out" (the raw-tree shape). Neither matches how a developer would debug a desktop app.

The inspect/highlight/state/wait/clear loop is the bridge. It is the smallest set of tools that lets an agent do what a human at Inspect.exe does: see the partition of the window, point at one specific node, check whether it is interactive, wait until it is, and clean up. Once you have that loop, the click and type tools that everyone has are usable. Without it, they are guesses.

Where to read the source

MCP dispatch table: crates/terminator-mcp-agent/src/server.rs near line 9953.
Inspect overlay implementation: crates/terminator/src/platforms/windows/inspect_overlay.rs.
OverlayDisplayMode enum: crates/terminator/src/platforms/mod.rs lines 37 to 53.
Repo home: github.com/mediar-ai/terminator.

Related guides on this site

Browser MCP to desktop automation: replace the dispatch root, not extend it — what changes when the MCP's dispatch root is the OS instead of a tab.
RPA accessibility tree selectors: the actual grammar — the selector strings the inspector overlay surfaces, with operator precedence.
Terminator MCP: the desktop automation server that type-checks your workflow before it ever clicks a pixel — the typecheck_workflow tool that runs tsc --noEmit before execution.
Accessibility API for desktop automation — why structural lookup beats OCR and pixel matching.

Need to wire this into your agent before next week?

Bring your specific desktop automation target. We will pair on the selector grammar, the inspector overlay loop, and the right MCP integration for your editor.

Frequently asked questions

What does "MCP dev tools for desktop accessibility" actually mean? I see those words but they could mean five things.

It means the set of tools an MCP server exposes to a coding agent (Claude, Cursor, Windsurf, VS Code) so the agent can inspect, identify, and operate on the OS-level accessibility tree the way a frontend developer uses Chrome DevTools to inspect, identify, and operate on the DOM. Concretely, that is five primitives: read the tree (get_window_tree), draw labeled rectangles over every element on the actual screen (get_window_tree with show_overlay:"ui_tree"), highlight one specific match (highlight_element), check element state (validate_element, wait_for_element), and clear the overlay (stop_highlighting). The Terminator MCP server has all five as named tools in one dispatch table at crates/terminator-mcp-agent/src/server.rs near line 9953.

Why does an LLM need an on-screen overlay if it has the tree as JSON?

Two reasons. First, when get_window_tree returns 800 elements with similar role and name attributes, an agent that has only the JSON cannot tell which one a human user means by "the Save button." Drawing rectangles labeled with [index] or [index:role:name] over the actual screen lets the agent ask the user to confirm by index, or pass an annotated screenshot back into a vision model to ground the reference. Second, when a workflow misclicks, the developer reading the run log wants to see what the agent saw at that step. The overlay is the visual artifact that closes the gap between the JSON tree and the pixels on the user's monitor. Implementation is at crates/terminator/src/platforms/windows/inspect_overlay.rs, show_inspect_overlay() at line 103 builds a layered Win32 window with WS_EX_LAYERED | WS_EX_TRANSPARENT | WS_EX_TOPMOST and paints rectangles plus labels with GDI Rectangle and DrawTextW.

What are the seven label modes for the inspect overlay, and when do I use each one?

They live in OverlayDisplayMode at crates/terminator/src/platforms/mod.rs lines 37-53. Rectangles draws the boxes only, no labels, useful when you just want to see the partition of the window. Index labels each rectangle with [N], the same N you can pass to nth: in a selector. Role labels each with the accessibility role (Button, Edit, Window, etc.). IndexRole combines the two as [N:Role] and is the default for inspector workflows. Name uses the accessible name, which is what a screen reader would read aloud; this is what to use when triaging localization issues. IndexName is [N:Name]. Full is [N:Role:Name] and is the densest label, useful when you are screenshotting the overlay to feed to a vision model. The mode is chosen by the overlay_display_mode argument on get_window_tree.

How does this differ from the Inspect.exe tool that ships with the Windows SDK?

Inspect.exe is a standalone GUI you launch with a mouse. It is fine for one-off debugging. The MCP overlay is the same idea but driven by tool calls from an LLM in your editor. You write "show me the accessibility tree of Notepad with index labels," the agent calls get_window_tree with process:notepad and show_overlay:"ui_tree" and overlay_display_mode:"index_role", and the rectangles appear on your monitor in under a second. The same agent can then pick an index, call click_element with selector:nth:14 (or with the resolved role+name), and verify the result with validate_element. Inspect.exe cannot do that loop without a human at the keyboard. The selectors that the overlay surfaces are also the exact strings the rest of the MCP tool surface accepts: there is one grammar across read, inspect, click, type, and validate.

Why is highlight_element separate from get_window_tree's overlay?

The overlay shows every element. highlight_element shows one. Implementation at server.rs line 5600. After the agent has resolved a selector to a specific match, it calls highlight_element with that selector and an optional color (passed as a u32 ARGB at server.rs line 5640) and an optional text label that prints next to the rectangle. The default duration is 1000ms; a tokio::spawn at line 5717 schedules cleanup after that window so highlights do not leak. On Windows you can also pass font_size, font_bold, and font_color to control the label rendering (server.rs lines 5651-5658). This is the same affordance Chrome DevTools gives you when you hover an element in the Elements panel and a colored box flashes around it in the viewport.

What is validate_element really doing under the hood?

It runs find_and_execute_with_retry_with_fallback against the live accessibility tree (server.rs line 5477) with a no-op action, so the only outcome that matters is whether the selector resolved. It returns the matched element's attributes (role, name, AutomationId, bounds, process_id), the selector that won (selector_used), and every selector it tried (selectors_tried), with timestamps. If you also pass include_tree_after_action:true, it attaches the resulting subtree under the matched element, which is the cheapest way to walk into a control without re-issuing get_window_tree. This is the equivalent of opening the Elements panel and clicking on a node to see its computed properties, but as a single tool call.

How is wait_for_element different from validate_element?

validate_element resolves the selector once. wait_for_element polls the selector until a named state condition holds, with a timeout. The valid conditions are exists, visible, enabled, focused (server.rs line 5990). exists short-circuits to the standard locator wait at line 5838. The other three poll, because the property they are watching can change without the tree shape changing (a button can become enabled in place after a network call resolves). Use exists when you are blocked on a window appearing, visible when you are blocked on a popup that is not yet on screen, enabled when you are blocked on a Submit button that just turned interactable, and focused when you are scripting a text field that has to be the focused control before keystrokes will land in it.

Where does stop_highlighting fit in the loop?

It is the cleanup tool. After get_window_tree drew the overlay or highlight_element flashed a box, the visual artifacts persist by design (the InspectOverlayHandle::Drop impl at inspect_overlay.rs line 60 explicitly does not auto-close). stop_highlighting at server.rs line 7458 removes the active overlay window and drains the active highlight handles. Call it before the workflow ends, or at the start of the next inspection pass, otherwise the user is staring at the previous run's labels.

Are these dev tools Windows-only? What works on macOS?

The MCP tool surface is identical on both platforms: get_window_tree, validate_element, wait_for_element, highlight_element, stop_highlighting all dispatch to a desktop trait that has Windows and macOS implementations. The on-screen overlay (show_inspect_overlay) is currently Windows-only because it uses Win32 layered windows; the cfg gates are at server.rs line 1717 (#[cfg(target_os = "windows")]). On macOS get_window_tree still returns the JSON tree (read from AXUIElementCopyAttributeValue calls underneath), but the overlay is a no-op. highlight_element on macOS uses the AX-side highlight path. The non-visual subset (validate_element, wait_for_element) is full parity.

How do I install this and try the inspector overlay myself in five minutes?

Two commands. claude mcp add terminator 'npx -y terminator-mcp-agent@latest' to register the server with Claude Code (Cursor and VS Code take an equivalent MCP entry in their config). Then in a chat ask: "open Notepad, run get_window_tree with show_overlay:'ui_tree' and overlay_display_mode:'index_role', then highlight the File menu, validate that it is enabled, and clear the overlay." The agent will issue open_application, get_window_tree, highlight_element, validate_element, stop_highlighting in that order, and you will see labeled rectangles appear over the Notepad window then disappear. The MCP server logs are written under ~/.terminator/logs by default; tail one to see the actual JSON each tool returned.

Can I use these tools from a script instead of from an LLM?

Yes. The same tool dispatch is reachable from the Rust crate (terminator-rs on crates.io) and the Node binding (@mediar-ai/terminator on npm) without going through MCP. The MCP server is a thin wrapper that exposes the dispatch_tool match block (server.rs line 9844) over JSON-RPC for LLMs; the underlying functions like Desktop::get_window_tree and Element::highlight are exported from the crate for direct use. So a CI script that wants to take an annotated screenshot of an app's accessibility tree before and after a change can call show_inspect_overlay directly, no LLM in the loop.

How does this compare with browser MCP servers like Playwright MCP and Chrome DevTools MCP?

Playwright MCP and Chrome DevTools MCP expose the same shape of tools (get tree, highlight, click, wait) but bound to a Page or browser context. Their inspector overlay is the browser's own inspect mode. The desktop accessibility equivalent of Playwright MCP is a server that binds tools to the OS accessibility tree and exposes the same primitives over the OS surface. Terminator is that server. The boundary matters when the workflow leaves the tab (a Save dialog, a desktop authenticator, an Excel paste, a native menu): a browser MCP cannot reach those, an OS-rooted MCP can, and the inspector overlay tool gives the agent the same visual grounding for native windows that Chrome DevTools gives it for the DOM. Source: github.com/mediar-ai/terminator.