Accessibility tree automation vs PyAutoGUI: the two clicks are not the same operation

Most comparisons of these two approaches read like checklists. One uses image recognition, the other uses UIA, both can drive mouse and keyboard. That framing buries the actual difference. The two libraries hit different primitives in the operating system. PyAutoGUI's click(x, y) always lowers to SendInput because PyAutoGUI does not know what is at (x, y). Accessibility tree automation calls UIInvokePattern.invoke() on the element's COM proxy and never generates an HID event. Twenty two lines of Rust at element.rs:838-859 versus the eighty line SendInput path at input.rs:38-117 explain the entire difference.

UIInvokePatternSendInputMOUSEEVENTF_ABSOLUTEIUIAutomationElementMIT

Matthew Diakonov, Written with AI

Published May 1, 202611 min read

Direct answer (verified 2026-05-01)

Pick the accessibility tree when the target exposes one. That is every native Win32, WinUI, UWP, AppKit, Electron, and Chromium-based app, plus most modern web through the browser's AX bridge. It runs in the background, does not fight your cursor, survives DPI and theme changes, and is roughly 100x faster on agent loops because it bypasses HID input entirely.

Pick PyAutoGUI when the target does not expose a tree: fullscreen DirectX or OpenGL games, canvas-rendered design tools (Figma's drawing surface, Excalidraw, Miro), and sandboxed remote desktop or VM viewers where the AX bridge does not cross the host boundary. Authoritative sources: Microsoft UIA Invoke pattern and PyAutoGUI documentation.

4.9from developers building real desktop automation

PyAutoGUI's own docs warn 'a single different pixel' breaks locateOnScreen

Terminator's invoke() at element.rs:838-859 is 22 lines, no SendInput on that path

Pattern invocation runs in the target process, no cursor takeover

100x faster than screenshot-based agents on the same agent loop (llms.txt:243)

Two libraries, two different things they do to the OS

Most articles framing this comparison stop at “PyAutoGUI is image-based, the others are accessibility-based”. That description is true and useless. It hides the fact that the two libraries call into the operating system at different layers, and once you see the layers the choice becomes mechanical instead of philosophical.

PyAutoGUI is a thin Python wrapper around the OS's human input emulation API. On Windows that is user32!SendInput; on macOS CGEventPost; on X11 XTest. These functions all do the same thing: inject a synthetic mouse or keyboard event at a screen coordinate as if a real human had moved their hand there. The OS does not care that the event came from a script.

An accessibility tree library calls a different family of functions. On Windows the call is IUIAutomationInvokePattern::Invoke on a COM proxy that lives inside the target application's address space. The proxy receives the call cross-process, runs whatever the application has wired up as the element's default action, and returns. No cursor motion. No virtual key press. No way for a human at the keyboard to interfere.

What happens when you click a Save button, by library

The top half is PyAutoGUI. Three SendInput calls, an OS dispatch step, and a hope that the WM_LBUTTONDOWN lands on the save button instead of a notification toast that just popped up. The bottom half is the tree. One method call, cross-process, returns when the action is done.

The same task, written both ways

Save a Notepad document, the canonical hello-world of desktop automation. Tab between the two implementations below. The PyAutoGUI version is shorter to write. The tree version is shorter to maintain.

Same task, two ways to send the click

Save the document in Notepad. With PyAutoGUI you find the save button by where it is on the screen, take a screenshot of it, save the image to disk, then in your script open the image, locate it on the live screen via template matching, compute its center in pixels, move the virtual cursor there, and synthesize a left mouse down and a left mouse up. Every step has a way to fail. The reference image goes stale when the OS theme changes. The match misses when the user is on a different DPI scale. The cursor motion conflicts with whatever the human is doing right now. The save dialog opens at a different position than your reference shot of the editor.

reference image must be re-captured on every theme, DPI, or resolution change
click moves the real cursor; conflicts with the user's live input
no error if the wrong element is at (x, y); the click just lands on something else
scripts get longer every time a new desktop config breaks an old reference

Reading the actual code

The PyAutoGUI script and the Terminator script aren't actually doing the same thing under the hood. Reading both side by side, with the OS-level comment about which syscall each one resolves to, is the whole point. The tab on the left is what most tutorials show. The tab on the right is what the tree library is actually doing.

What each library does to the OS

# pyautogui : everything is coordinates

import pyautogui

# move the cursor and click at a screen point
pyautogui.click(412, 300)

# or, find the button by template matching against a screenshot
button = pyautogui.locateOnScreen('save_button.png', confidence=0.9)
if button is not None:
    pyautogui.click(pyautogui.center(button))

# typing is virtual key presses (your real keyboard is the target)
pyautogui.typewrite('hello world', interval=0.05)

# windows path: this resolves to user32!SendInput with
# MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_MOVE | MOUSEEVENTF_LEFTDOWN
# | MOUSEEVENTF_LEFTUP for the click, and SendInput with
# KEYEVENTF_UNICODE for each character of typewrite.
#
# this is the same primitive a real human uses to drive the OS,
# which means everything that fights real human input fights this:
# foreground window changes, focus stealing protection, IME hooks,
# screen recording overlays, accidental cursor motion, monitor DPI.

-33% of the work moves into the OS

Feature by feature, the actual differences

The rows that matter most are the first and last. The first tells you what each library calls. The last tells you how many lines of code the click implementation is. Everything in between is a consequence of those two facts.

Feature	PyAutoGUI	Accessibility tree (Terminator)
what the click resolves to in the OS	SendInput with MOUSEEVENTF_ABSOLUTE \| MOUSEEVENTF_MOVE \| MOUSEEVENTF_LEFTDOWN \| MOUSEEVENTF_LEFTUP	IUIAutomationInvokePattern::Invoke, a COM call into the target process
how the target is addressed	absolute screen coordinates, normalized to 0 to 65535 by screen size	selector grammar resolved to an IUIAutomationElement: role, name, AutomationId, process scope
cursor and keyboard while running	physically moves cursor, presses real virtual keys, fights live user input	no cursor motion, no keyboard focus change, runs while you keep working
behavior at different DPI or resolution	coordinates and reference images go stale; rerecord on every monitor	selectors carry semantic identity; survive DPI, theme, resolution changes
failure when target moves or hides	click lands on whatever is now at (x, y), often the wrong thing, silently	ElementNotFoundError, ElementNotVisible, or ElementNotEnabled, all typed
speed per action	limited by SendInput timing, debounce, and any added safety pause	limited by COM round trip into the target process, hundreds of actions per second
works on canvas, OpenGL, DirectX surfaces	yes, the only option for these targets	no, the tree is empty or single node
works on Win32, WinUI, UWP, AppKit, Electron, web in browser	yes via pixels, but coordinates and template images are brittle	yes via the tree the OS exposes for screen readers
concurrency across multiple apps	single threaded; serializes through a virtual cursor	providers run per process; asyncio.gather across apps actually overlaps
the rough size of the click implementation	the full SendInput dance (about 80 lines on input.rs:38 to 117)	twenty two lines on element.rs:838 to 859 calling invoke_pat.invoke()

Where each one wins, and where each one quietly loses

Neither approach is universally right. The accessibility tree is the better default by a wide margin, but it has a failure mode (browser web views silently no-oping AX actions on macOS) that PyAutoGUI does not share. PyAutoGUI has a failure mode (silently clicking the wrong thing when the layout changes) that the tree does not share. Knowing both sets of failures is the whole skill.

Where PyAutoGUI is the only option

Fullscreen DirectX or OpenGL games with one frame buffer and no AX provider. Canvas drawing surfaces in browsers (Figma, Excalidraw, Miro) where every tool lives inside a single canvas element. Sandboxed remote desktop and VM viewers where the AX bridge does not cross the host boundary. In these cases the tree is empty or single node, and pixel coordinates are the only addressable surface.

Where PyAutoGUI silently does the wrong thing

Multi monitor with mixed DPI scaling. PyAutoGUI's own docs note multi-monitor behavior is unreliable depending on OS version. The click lands on whatever is now at (x, y), and 'whatever' might be a different window than the one your reference image came from.

Where the tree silently does the wrong thing

macOS Chrome, Safari, Arc, Firefox web views. AXPress and AXClick return kAXErrorSuccess and the page never changes. Production AX engines maintain a hardcoded browser bypass list (8 names in the deleted Terminator macos.rs at lines 415 to 424) and fall back to synthetic input on those apps. The tree is the default; the synthetic path is the planned fallback.

Where the tree is unambiguously better

Native Win32, WinUI, UWP, AppKit, every Electron app, every Office app, the system shell. These all expose rich AX or UIA trees with stable AutomationIds and named controls. A selector lasts across releases of the target app; a pixel reference does not.

Where they cooperate

Many production agents stack them. Tree first for speed and determinism; OCR or template matching for elements with no AX provider; PyAutoGUI synthetic input for anything that requires real cursor motion (drag and drop in apps that hit-test from MouseMove, animation triggers, drawing). Terminator's own architecture is described in llms.txt as 'accessibility tree + DOM + OCR + vision AI for maximum reliability'.

The decision rule, written out

Run through the list below in order. The first answer that matches is your answer. Most desktop automation problems terminate at the first or second item; the rest exists because real production scripts mix layers.

When to pick which

Does the target expose a real AX or UIA tree with named controls? Use the tree.
Is your script going to run while a human uses the same machine? Use the tree (no cursor takeover).
Will it run in CI on a headless or remote VM where DPI may differ from your dev box? Use the tree (selectors survive DPI).
Will it run hundreds of actions in an agent loop? Use the tree (CPU speed, not HID-event speed).
Is the target a fullscreen game, canvas drawing surface, or DirectX renderer? PyAutoGUI is the only option.
Does the target's tree return success on the action but the UI does not change? Fall back to synthetic input (PyAutoGUI's primitive, but ideally through your tree library's own click() method that falls back to SendInput).
Are you writing an LLM agent that picks targets visually? Stack them: tree for the actions the LLM resolves to a selector, vision plus PyAutoGUI for the targets it can only describe by appearance.

The 100x claim, where it comes from

Terminator's own documentation puts the speed delta at roughly two orders of magnitude. The number comes from a specific comparison: agents that screenshot the screen, send the image to an LLM, ask the LLM to output click coordinates, then PyAutoGUI those coordinates. That loop spends most of its budget on LLM inference. A tree-based agent that resolves a selector and calls a UIA pattern spends most of its budget on a COM round trip. The first is bounded by inference latency (often hundreds of milliseconds per action). The second is bounded by IPC latency (often single digit milliseconds per action). On an agent loop with hundreds of steps, the difference compounds to roughly the published 100x.

100x

“Runs 100x faster than ChatGPT Agents, Claude, Perplexity Comet, BrowserBase, BrowserUse (deterministic, CPU speed, with AI recovery). >95% success rate unlike most computer use overhyped products.”

Terminator README, 'Why Terminator > For Developers'

What developers actually say after switching

Two of the most repeated observations from production scripts: the cursor stops fighting you, and the script stops failing on your colleague's laptop. Both are consequences of leaving the HID layer and moving into the tree.

“Pattern invocation does not touch the cursor. We market this as 'does not take over your cursor or keyboard', and that statement is literally true for every action that resolves to a UIA pattern: invoke, toggle, expand_collapse, set_selected, and type_text against a value pattern edit control.”

Terminator

llms.txt and source comments at element.rs:838-859

Migrating a PyAutoGUI script to the tree

The mechanical translation is straightforward and the order matters. Replace coordinate based actions with selectors first, then replace screenshot diff checks with tree property reads, then keep PyAutoGUI calls only for the residual targets that genuinely have no tree node. Below is the standard order of operations.

Run the target app and inspect its tree. accesschk.exe and the built-in inspect.exe from the Windows SDK both expose UIA nodes. On macOS, Apple's Accessibility Inspector does the same. Look for AutomationId, Name, and ControlType on each control you currently click by coordinate.
Replace each pyautogui.click(x, y) with a selector. The grammar is process:foo >> role:Button && name:Save. If the control has an AutomationId, prefer that over role and name (it is more stable across releases).
Replace locateOnScreen('icon.png') with desktop.locator(...).all(timeout). The tree returns every match in one call, with bounds and properties, so you do not need to template match.
Replace pyautogui.typewrite() with element.typeText(). Where the element is a real edit control, this resolves to ValuePattern.SetValue, no virtual key presses. Where it is not, fall back to synthetic key events.
Keep PyAutoGUI for the residual: drag-and-drop with hit-testing on MouseMove (some games), drawing on a canvas, anything that physically requires cursor motion. These are now isolated to specific call sites instead of being your default mode.
Move from click() to invoke() wherever the target supports InvokePattern. This is the one line that buys you background execution and the 100x claim simultaneously.

The libraries you will actually pick from

Below is the working set of Python and cross-language libraries that show up when you choose a path. PyAutoGUI is at the top because it is the canonical pixel library; the rest are tree libraries with different tradeoffs.

PyAutoGUIpywinautoatomacospyobjc-framework-ApplicationServicesSikuliuiautomationHammerspoon axuielementterminator-py

PyAutoGUI is the reference pixel library; pywinauto is the historical Python tree library on Windows; uiautomation is the alternate UIA wrapper; atomacos covers macOS AX from Python; and terminator-py is the binding to the Rust UIA core that drives the MCP server. If you are picking today and you want concurrency (asyncio.gather across multiple apps actually overlapping), terminator-py is the only one that releases the GIL on every UI call.

Migrating a PyAutoGUI script to the accessibility tree?

Walk through your existing pixel-based script with the team that maintains the UIA invoke() path. We have already hit the failure modes.

Questions developers ask after switching

What is the literal difference between an accessibility tree click and a PyAutoGUI click?

Accessibility tree automation calls a method on a UI element. PyAutoGUI sends a mouse event at coordinates. On Windows, the tree path resolves to IUIAutomationInvokePattern::Invoke, a COM call that runs inside the target process and returns a binary success or failure. The PyAutoGUI path resolves to SendInput with MOUSEEVENTF_ABSOLUTE plus MOUSEEVENTF_MOVE plus MOUSEEVENTF_LEFTDOWN plus MOUSEEVENTF_LEFTUP, a kernel call that injects HID events at normalized screen coordinates. The two operations have different types. One asks the element to perform its default action; the other moves a virtual mouse and presses a virtual button at a screen location and hopes the right element is under it.

When should I pick PyAutoGUI over an accessibility tree library?

When the target does not expose a tree. That is the only honest case. Three real examples: a fullscreen game rendered through DirectX or OpenGL where the only thing on the screen is a frame buffer; a canvas-based design tool like Figma's drawing surface where each tool's hit region lives inside a single canvas element with no children; a sandboxed remote desktop or virtual machine viewer where the accessibility bridge does not cross the host boundary. In all three the AX or UIA tree is empty or single-node, and pixel coordinates are the only addressable thing. Everywhere else, picking PyAutoGUI is choosing the OS's worst-case input path on purpose.

Is the accessibility tree actually faster, or is that just a marketing claim?

It is faster, and the cause is not mysterious. SendInput on Windows posts events into the message queue, which the OS schedules with timing that depends on cursor smoothing settings, foreground priority, and any IME or low level keyboard hook installed by other software. A typical click is at minimum the time to move plus debounce plus down plus up, and most automation tools add a 10 to 50 ms safety pause around each event so the target can react. UIInvokePattern.invoke runs as a COM cross process call and returns when the target process acknowledges. There is no virtual cursor motion, no debounce window, no foreground gating. On Terminator's own claim of 100x faster than screenshot or pixel based agents, the source is the llms.txt at line 243: 'CPU speed, not LLM inference', which describes pattern invocation specifically.

Why does PyAutoGUI break when the user resizes the window or changes DPI?

Because (x, y) is a coordinate in screen space, and screen space changes. A button at (412, 300) on a 1920x1080 monitor at 100% scale is at (515, 375) on the same window dragged to a 200% scaled 4K monitor. PyAutoGUI's locateOnScreen function (the higher level alternative) takes a reference image and template matches against the current screenshot pixel by pixel; PyAutoGUI's own docs note that 'if a single pixel is a different color, locateOnScreen will not find the image'. The accessibility tree carries semantic identity (role, automation id, name) that survives DPI changes, theme changes, resolution changes, and most layout changes. The only thing that breaks the tree path is the developer renaming the control or removing the AutomationId, and that shows up as a typed ElementNotFoundError instead of a click on the wrong place.

Does the accessibility tree solve everything? What does it miss?

Two things, both worth knowing. First, custom controls that draw their own pixels and do not implement the AX or UIA provider interface show up in the tree as a single opaque element. Some games, some terminal emulators, some 3D modellers, and some legacy ActiveX widgets fall in this bucket. Second, write actions through the AX bridge sometimes silently no-op on browser web views: AXPress and AXClick on macOS Chrome and Safari return success and do nothing, which is why production AX engines maintain hardcoded browser bypass lists and fall back to synthetic input on those apps. The tree is the right default; the synthetic input path is the right fallback; PyAutoGUI is the right tool when there is nothing else.

What does Terminator actually do under the hood, line by line?

On Windows, Terminator's invoke() at crates/terminator/src/platforms/windows/element.rs lines 838 to 859 is twenty two lines of Rust. It calls get_pattern::<patterns::UIInvokePattern>() on the wrapped IUIAutomationElement, classifies the error into UnsupportedOperation or PlatformError, and calls invoke_pat.invoke(). No SendInput, no GetCursorPos, no screen metrics math. The fallback click() path lives in crates/terminator/src/platforms/windows/input.rs at function send_mouse_click starting line 38, and it is the canonical SendInput dance: GetCursorPos to save state, GetSystemMetrics for SM_CXSCREEN and SM_CYSCREEN, multiply by 65535 and divide by screen size to normalize, build three INPUT structs (move, down, up), call SendInput on each. Roughly eighty lines for the path that PyAutoGUI's whole click is conceptually a port of.

Can I run an accessibility tree tool while still using my mouse for normal work?

Yes, and that is the largest practical difference. Pattern invocation does not move the cursor or change keyboard focus. You can run an automation against a background Excel workbook while continuing to write code in another window. PyAutoGUI cannot do this; every click physically moves your cursor and every typewrite physically presses your keys. If your cursor was hovering a save button when the script fires, the script's click and your accidental human click queue together, and which one wins is undefined. Terminator markets this as 'does not take over your cursor or keyboard', and that statement is literally true for every action that resolves to a UIA pattern: invoke, toggle, expand_collapse, set_selected, and type_text against a value pattern edit control.

Is there a Python library that does what Terminator does, but for those who already use PyAutoGUI?

Several, with tradeoffs. pywinauto is the closest historical match on Windows: it walks UIA, exposes a Python-shaped API, and has been around for a decade. Its weakness is concurrency; it is fully synchronous and holds the GIL through every UIA call, so asyncio.gather cannot parallelize it. Terminator's Python binding (pip install terminator-py, import terminator) wraps the same Rust UIA core that drives the MCP server, and every awaitable releases the GIL onto a Tokio reactor, which means asyncio.gather across multiple apps actually overlaps. atomacos and pyobjc-framework-ApplicationServices cover the macOS AX side. None of them ship the synthetic input fallback list out of the box; you usually wire that in yourself when a target lies about supporting AXPress.

Is PyAutoGUI dead? Why does it keep coming up in tutorials?

PyAutoGUI is not dead, and it is the right answer for a narrow set of problems: scripts that operate over genuinely opaque graphical targets (games, canvas apps, presentations, retro emulators), educational examples where a beginner needs to see a cursor move, automation across an OS or app combination where no AX bridge exists. The tutorial overrepresentation comes from PyAutoGUI being the easiest possible thing to demo in five lines of Python. The cost of that simplicity shows up in production: every script that reaches a thousand actions ends up either (a) growing a screenshot diff harness to detect when a pixel changed, or (b) being rewritten against an accessibility library because the screenshot harness still misses things on a different DPI.

If I am writing an AI agent, which one should the agent use to interact with the desktop?

The accessibility tree, with PyAutoGUI as the bottom of the fallback stack. An agent that takes screenshots and asks an LLM to output (x, y) coordinates is paying the LLM inference latency on every action, which is the path ChatGPT Agents and Claude computer use take. Terminator's own claim is roughly 100x faster on this loop because invoke() is a COM call that does not require a model in the loop at all; the model picks a selector, the framework resolves it, and the action fires. PyAutoGUI underneath catches the cases where the tree returns nothing, and OCR plus vision AI catches the cases where even pixel coordinates are not stable. The tree is not the only layer, but it is the layer that turns an agent loop from inference-bound to CPU-bound.

Read these next if you are still on the pixel side

Adjacent reading

Patterns

Accessibility API desktop automation: fire Control Patterns, skip the mouse

Companion deep dive on UIA Control Patterns. invoke() at element.rs:838-859 calls UIInvokePattern.invoke() directly. Toggle, ExpandCollapse, Value, SelectionItem rounded out.

Read

Python

Python desktop automation that actually runs concurrently with asyncio.gather

PyAutoGUI and pywinauto are sync and hold the GIL. terminator-py wraps every awaitable in pyo3_tokio::future_into_py_with_locals so asyncio.gather across apps overlaps for real.

Read

macOS

macOS accessibility UI tree automation: the write path nobody warns you about

AXPress and AXClick lie on Chrome, Safari, Arc, Firefox web views. The 8-browser bypass list, the 3-tier click fallback, the manual Send + Sync wrapper. From a 4,368-line macos.rs.

Read