macOS AX vs Windows UIA agent: what one trait can hide, and where it leaks

Two OS accessibility APIs that expose the live UI tree and let you act on it. AXUIElement on macOS, IUIAutomationElement on Windows. A Playwright-shaped Rust trait can flatten most of the surface, but three things leak through: role names, action invocation, and the focused-element semantics. Terminator's repo declares the trait at crates/terminator/src/platforms/mod.rs:86 and gates the build to Windows at lines 319 to 320. This page is the inside of that trait.

AXUIElementIUIAutomationkAXPressActionInvokePatternv0.24.32

Direct answer (verified 2026-05-04)

AX and UIA are different OS APIs with similar shapes. UIA on Windows is COM-based and exposes typed control patterns. AX on macOS is Mach-based and uses a string-keyed attribute and action model. A single Rust trait (Terminator's pub trait AccessibilityEngine at crates/terminator/src/platforms/mod.rs:86) can hide most of the surface, but role names, action invocation, and focused-element semantics leak through. On Terminator's main branch only the Windows half compiles; lines 319 to 320 of the same file emit compile_error!("Terminator only supports Windows...").

Pick the platform your users are on, ship one engine per OS, and treat the cross-platform trait as a developer-experience bet, not a runtime portability promise. Authoritative sources: Microsoft UI Automation and Apple AXUIElementCopyAttributeValue.

The trait that pretends both are the same

Terminator's core crate declares one async trait and asks every platform to implement it. Read the method list and try to spot the OS. You cannot. The signatures are platform-neutral: they take a Selector, an optional UIElement root, an optional Duration timeout, and return a Result<UIElement, AutomationError>. Nothing in the trait says COM, nothing says Mach, nothing says AXPress or InvokePattern. That is the point of the abstraction; that is also where the trouble starts.

What the trait promises to hide

  • find_element(selector, root, timeout) returns a UIElement on either OS; the resolver underneath is COM on Windows, Mach on macOS.
  • get_focused_element() returns the focused UIElement; UIA gives it instantly, AX requires AXUIElementCreateSystemWide and a permission gate.
  • click_at_coordinates(x, y) is identical syscall-wise on both: SendInput on Windows, CGEventCreateMouseEvent on macOS, with platform-specific DPI math.
  • get_window_tree(pid, title, config) returns a UINode of the same shape; the per-element walk uses IUIAutomationTreeWalker on Windows, AXUIElementCopyAttributeValue with kAXChildrenAttribute on macOS.
  • open_application(name), activate_application(name), open_url(url) all wrap OS-specific shell calls (ShellExecute on Windows, open(1) and NSWorkspace on macOS) but expose the same trait method.
  • Selector grammar (role:Button && name:Save, process:notepad, window:Calculator) compiles down to IUIAutomationCondition trees on Windows and AX attribute predicates on macOS.

The trait holds at the signature level. The leaks live in what each method has to actually do underneath to return the same UIElement.

Three places the abstraction leaks

Most cross-platform desktop libraries advertise feature parity and mean it at the method-name level. The leaks below are not bugs in those libraries; they are properties of the underlying APIs that no abstraction can fully hide. Plan for them at the agent layer.

What leaksWindows UIAmacOS AX
Role namesBare control-type strings: Button, Edit, Window.AX-prefixed: AXButton, AXTextField, AXWindow. The leak is encoded in element.rs around line 1885 (role == "axwindow" || role == "window").
Action invocationTyped pattern: GetCurrentPattern(InvokePatternId) then Invoke().String action name: AXUIElementPerformAction(elem, kAXPressAction).
Focused elementGetFocusedElement() on the IUIAutomation root. Synchronous, no permission gate.Walk from AXUIElementCreateSystemWide(), then kAXFocusedUIElementAttribute. Returns kAXErrorAPIDisabled until the user grants permission.
Stable ID for selectorsAutomationId set by app developers; survives localization and theme.kAXIdentifierAttribute, populated sporadically by SwiftUI; agents fall back to (role, title, position) tuples.
Synthetic input fallbackSendInput Win32 API; DPI-aware once you call SetProcessDpiAwarenessContext.CGEventCreateMouseEvent in Quartz; uses logical points, no DPI math needed.
Browser AX coverageChrome and Edge expose a usable UIA tree for most DOM; AutomationId maps to ARIA where set.Chrome and Safari often no-op AXPress on web views; production agents carry a hardcoded bypass list and route to vision.
Permission surfaceNone. UIA calls just work from any process.System Settings > Privacy & Security > Accessibility. First-call failures until the user toggles on.

The row that defines the engineering effort is focused element. UIA gives it back synchronously from any process; AX requires a system-wide handle, an attribute query, and a runtime permission check. The trait method get_focused_element() -> Result<UIElement, AutomationError> looks identical, the recovery story is not.

What the engine implementation looks like on each side

Same trait method, find_element, two implementations. The Windows side runs today; the macOS side is a sketch of what would have to exist below the trait for the abstraction to hold. Toggle to compare.

find_element on Windows UIA vs macOS AX

// Inside the Windows implementation of AccessibilityEngine.
// crates/terminator/src/platforms/windows/engine.rs (paraphrased)

impl AccessibilityEngine for WindowsEngine {
    fn find_element(
        &self,
        selector: &Selector,
        root: Option<&UIElement>,
        timeout: Option<Duration>,
    ) -> Result<UIElement, AutomationError> {
        // 1. Build IUIAutomationCondition tree from the Selector
        let condition: IUIAutomationCondition =
            self.compile_selector(selector)?;

        // 2. FindFirst on the IUIAutomationElement scope
        let raw: IUIAutomationElement = match root {
            Some(r) => r.as_uia()?.FindFirst(TreeScope_Subtree, &condition)?,
            None    => self.uia.GetRootElement()?
                            .FindFirst(TreeScope_Subtree, &condition)?,
        };

        // 3. Wrap in a platform-neutral UIElement
        Ok(UIElement::from_uia(raw))
    }
}

// click() then asks for the typed pattern:
//   GetCurrentPattern(UIA_InvokePatternId)
//   .cast::<IUIAutomationInvokePattern>()?.Invoke()
// Falls back to SendInput at element.BoundingRectangle on miss.
-13% lines per resolver

The Windows side leans on IUIAutomationElement.FindFirst with a compiled IUIAutomationCondition tree, so the OS does the matching in-process and returns one element. AX has no equivalent: there is no FindFirst on AX, you walk the tree yourself, ask every node for its attributes, and short-circuit when the role and title match. That walk is a real implementation cost the trait does not expose.

1 of 3 platforms

Terminator only supports Windows. Linux and macOS are not supported.

crates/terminator/src/platforms/mod.rs lines 319 to 320

What ships today, in plain terms

The Terminator workspace at version 0.24.32 carries scaffolding for macOS in several files (element.rs:1883 for process_name, lib.rs:1567 for the role-name match, health.rs:145 for a stub macOS health checker that returns “Accessibility API health checks not yet implemented”), but the workspace does not build on a non-Windows host. The published artefacts on crates.io (terminator-rs), npm (terminator-mcp-agent), and pip (terminator-py) are Windows binaries.

The repo's own llms.txt says the same thing in one line: “The Node.js, Python, and MCP packages currently ship Windows binaries only.” The trait shape is the cross-platform bet; the trait implementations are not there yet on macOS. If you need a Windows agent today, this is the production path. If you need a macOS agent today, this codebase is a useful reference for the trait shape and the selector grammar, but you will be wiring the AX engine yourself.

How to pick today

The decision is rarely AX vs UIA in the abstract. It is which users you are shipping to and what surfaces you have to reach. Four common cases below.

Pick by where the agent runs

  1. 1

    Windows users only

    Use a UIA-backed engine. terminator-rs, FlaUI, or pywinauto. AutomationId selectors are stable across language and theme.

  2. 2

    macOS users only

    Use an AX-backed engine. atomacos or AXSwift. Plan for the Accessibility permission prompt on first run.

  3. 3

    Both, with one codebase

    Write a Playwright-shaped Locator layer over two engines. Ship one binary per OS. The trait shape is portable; the underlying library is not.

  4. 4

    Both, no per-OS engine

    Use a screenshot-and-vision agent like Anthropic Computer Use or OpenAI Operator. Trade speed and determinism for portability.

Building a desktop agent and weighing AX vs UIA?

We can walk through the trait shape, the leak points, and what you would have to ship per OS. Engineering call, not a sales call.

Frequently asked

Are macOS AX and Windows UIA the same kind of API?

Same job, different shapes. Both expose the live UI tree of running apps and let you act on elements without screenshots. Windows UI Automation is a COM-based API rooted at IUIAutomationElement, with strongly typed control patterns (InvokePattern, ValuePattern, TogglePattern, ExpandCollapsePattern) you query and call. macOS Accessibility uses an opaque AXUIElement reference and a string-keyed attribute model: AXUIElementCopyAttributeValue with kAXTitleAttribute, kAXValueAttribute, kAXRoleAttribute, and so on, plus AXUIElementPerformAction with kAXPressAction for clicks. UIA gives you a typed dispatch table; AX gives you a string-keyed bag of attributes and actions. The shapes converge once you wrap them in a higher-level trait, but the call sites underneath look nothing alike.

Can a single Rust trait actually abstract both APIs?

At the signature level, yes. Terminator's pub trait AccessibilityEngine at crates/terminator/src/platforms/mod.rs line 86 declares find_element, find_elements, get_focused_element, get_window_tree, click_at_coordinates, and a dozen more, all with platform-neutral types (UIElement, Selector, AutomationError). On Windows the implementation walks the IUIAutomation tree via the uiautomation Rust crate, on macOS it would walk the AX tree via accessibility-sys or a similar shim. The trait says nothing about COM or Mach. Below the trait, the implementations diverge: the Windows engine resolves IUIAutomationCondition trees and calls IUIAutomationInvokePattern.Invoke, the macOS engine would build kAXAttributedStringForRangeParameterizedAttribute queries and call AXUIElementPerformAction. The leaks are not in the trait shape; they are in what each method has to do underneath to return the same UIElement.

Where does the abstraction actually leak?

Three places, in increasing order of pain. First, role names. macOS prefixes most roles with AX (AXButton, AXTextField, AXWindow), Windows uses bare control type strings (Button, Edit, Window). Terminator's repo has the leak written into element.rs around line 1885: the macOS-only branch matches role == 'axwindow' || role == 'window', the Windows branch matches role == 'window'. Second, action invocation. UIA exposes a typed pattern: GetCurrentPattern(InvokePattern) then Invoke. AX uses a string action name: AXUIElementPerformAction(element, kAXPressAction). The trait method click() papers over both, but the fallback story when the action is not supported is different on each. Third, focused element. UIA's GetFocusedElement returns a process-scoped element instantly. AX requires you to walk from the system-wide AXUIElement (AXUIElementCreateSystemWide) and ask for kAXFocusedUIElementAttribute, with permission gates that can fail at runtime. Same trait method, same return type, very different failure modes.

What does Terminator ship for macOS today?

Nothing, on the main branch. The trait is shaped to be cross-platform, the locator grammar in selector.rs is platform-neutral, and several files (element.rs around line 1883, lib.rs around line 1567, health.rs around line 145) carry #[cfg(target_os = 'macos')] code paths. But crates/terminator/src/platforms/mod.rs at lines 319 and 320 ends the file with: #[cfg(not(target_os = 'windows'))] compile_error!('Terminator only supports Windows. Linux and macOS are not supported.'). On any non-Windows host the workspace will not build. The published binaries on npm (terminator-mcp-agent), pip (terminator-py), and crates.io (terminator-rs) are Windows binaries only. If you read llms.txt in the repo it states this directly: 'The Node.js, Python, and MCP packages currently ship Windows binaries only.' macOS support is a trait shape and some scaffolding, not a working engine.

If I need to drive both macOS and Windows, what are the real options?

Three honest paths. One: use a separate engine per OS. atomacos for macOS Python, FlaUI or pywinauto for Windows Python; or AXSwift for native macOS and the C# UIAutomationCore for Windows. You write two driver modules and merge them at a higher layer. Two: pick a screenshot-and-vision engine like Anthropic Computer Use or OpenAI Operator that does not depend on OS accessibility at all; you trade speed and determinism for portability. Three: use a Playwright-shaped trait library (Terminator on Windows is one) and assume one binary per OS, with the trait keeping your higher-level code identical. None of these gives you 'one binary, two platforms'. The trait can promise that signature; the underlying COM and Mach worlds will not let you ship a single dlopen-able shared library that talks to both.

Why is one trait method enough for a click on both platforms?

Because the trait method is doing the heavy lifting of picking the right action. Inside element.click(), the Windows implementation tries patterns in order: InvokePattern.Invoke for buttons, TogglePattern.Toggle for checkboxes, SelectionItemPattern.Select for radios, then falls back to a synthetic mouse event at the element's bounding box if no pattern matches. The macOS implementation would try AXUIElementPerformAction with kAXPressAction first, then kAXShowMenuAction or kAXIncrementAction depending on the role, then fall back to a CGEventCreateMouseEvent at the element's AXFrame. The trait promise is 'click does the right thing for this element'. The implementation per platform owns the dispatch table for what 'right' means. The reason this works is that both APIs expose enough role information to pick a sensible default action; the leak is only that the default actions are named differently and have different fallbacks.

Does macOS AX have anything like UIA's AutomationId?

Not exactly, and this is one of the largest practical differences. UIA defines an AutomationId attribute that an app developer can set and that survives localization, theme changes, and most refactors. WinAppSDK apps (Calculator, Notepad, Settings) populate it for nearly every interactive element, and a selector like id:NumberPadFiveButton works on every Windows machine in any language. macOS AX has no native equivalent. The closest thing is kAXIdentifierAttribute which AppKit sets sporadically, mostly for SwiftUI-built controls that opted in. In practice, agents on macOS rely heavily on (role, title, parent) tuples and positional walks (the third button in the toolbar of the focused window), which are inherently more localization-sensitive than UIA AutomationId-based selectors. A trait abstraction can hide this with a single Selector grammar, but the macOS resolver will silently fall back to title and position more often, which is something you observe in production but cannot fix in the trait.

What about permissions? UIA does not ask, AX does.

Right, and this is a runtime leak the trait cannot fully hide. On Windows, UIA calls just work; the COM API is not gated. On macOS, AXUIElementCopyAttributeValue against another process returns kAXErrorAPIDisabled or kAXErrorNotImplemented unless your binary has been added to System Settings > Privacy & Security > Accessibility, and the user has toggled it on. The first call from a fresh install will silently fail and the agent has no signal to recover. Production macOS agents prompt with AXIsProcessTrustedWithOptions during init, surface a permission setup screen, and re-check on every cold start. None of this maps to anything UIA-side. The trait method get_root_element() succeeds on Windows from any process; on macOS it fails until the user clicks a checkbox in System Settings. The trait can return the same Result type, but the failure modes need different runbooks.

Is the screenshot fallback useful for either platform?

Yes for both, for different reasons. On macOS, AX silently no-ops on AXPress for many web views in Chrome and Safari, and Electron apps inconsistently expose role and title. The honest production path on macOS uses AX where it works and falls back to OCR or vision (OmniParser, Gemini) on Electron and web-rendered surfaces. On Windows, UIA covers WinUI, WPF, WinForms, MFC, and most major IDEs natively, but games, custom-painted line-of-business apps, and anything rendered via DirectX or canvas show up as one opaque element. The fallback is the same: OCR or vision for grounding, then synthetic input for the click. Terminator's MCP server exposes this as vision_type with five values (UiTree, Ocr, Omniparser, Gemini, Dom), defaulting to UiTree. The same router would apply on macOS, with AX as the default and screenshot+vision as the labelled fallback for AX-empty surfaces.

What should I actually pick today?

Match the platform your users are on, and assume one engine per OS. If your agent runs on Windows desktops, a UIA-backed engine like Terminator (terminator-rs on crates.io, terminator-mcp-agent on npm) or a UIA wrapper like FlaUI is the production path; UIA is mature, AutomationId-based selectors are stable, and synthetic input is the fallback for AX-empty surfaces. If your agent runs on macOS desktops, atomacos or a pyobjc shim around AXUIElement is the production path today, paired with a vision fallback for Electron and Chromium. If you need both, write a thin layer that picks the engine per OS and exposes a Playwright-shaped Locator API to your agent code; do not expect any single library to ship one binary that works on both. The Playwright-shape is the right abstraction, but each leg of it is a separate ship.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.