Python automation on Windows, built for agents

Every other guide on this topic teaches you pywinauto loops that click a button, type a string, and move on. That shape of script does not compose with an LLM, and it gets slow the moment the tree is non-trivial. Terminator's Python binding gives you one call, desktop.get_window_tree(pid), that returns the entire window's accessibility tree as JSON. This page is about that call, the Rust path it takes, and why it is the primitive a Claude agent actually wants.

Matthew Diakonov, Written with AI

Published April 23, 20268 min read

4.9from dozens of design partners

One line: tree = desktop.get_window_tree(pid)

str(tree) is pretty JSON, ready for any LLM prompt

~200 ms for a 245-element window, verified in tree_builder.rs line 386

Python automation on Windows, minus the loop

One call returns every UI element in a window as JSON.

desktop = terminator.Desktop()

pid = desktop.open_application('notepad').process_id()

tree = desktop.get_window_tree(pid)

print(tree) # pretty JSON, LLM-ready

200 ms for 245 elements, one COM round trip

0:00 / 0:05

What every other guide on this topic skips

Search the existing playbooks on Python automation on Windows and you will find the same four ideas, repeated in different orders: drive the mouse with pyautogui, walk controls with pywinauto, talk to Excel through pywin32, and fire global hotkeys with keyboard. All of them are Python loops that issue one operation at a time. They work, they have worked for a decade, and none of them were designed for an LLM to participate in.

An agent does not want to iterate a tree. It wants the tree. It wants a JSON blob with roles, names, bounds, and stable identifiers, so it can pick a selector, call one action, and move on. The missing primitive in every existing Python automation library for Windows is the serialization step. You end up writing it yourself, badly, every time.

Terminator ships that primitive. The Python binding is a PyO3 extension, so the Rust core holds the hot path. One method on the Desktop object returns a UINode. The UINode prints as pretty JSON. You hand the JSON to Claude. Claude gives you back a selector. You click.

The shape, in numbers

Three measurements that matter when you are piping a window tree into a model on every turn of an agent loop.

0call from Python to get the whole tree

0UIA properties per node, pre-fetched

0elements in a mid-size Windows window

0 mscached walk, verified in tree_builder.rs

Install, one terminal session

The wheel name has a hyphen, the import does not. If your Python is 3.10 or later on Windows, this is all you need.

powershell

Python automation on Windows in four steps

1
pip install terminator-py
One wheel per Python version, no compiler needed. The PyPI name is terminator-py, the import name is terminator.
2
Open an app, grab its PID
desktop.open_application('notepad') returns a UIElement. Call element.process_id() to get an int.
3
get_window_tree(pid)
Returns a UINode. The subtree is already populated in Rust via one find_first_build_cache call. 7 UIA properties per element.
4
Hand str(tree) to a model or parse it
The __str__ is serde_json::to_string_pretty. Save it, prompt an LLM with it, or walk UINode.children in Python and keep going.

The one-liner the rest of this page is about

Every capability below, every comparison, every selector trick, is built on these ten lines. This is a real Python script you can paste into a file called dump_tree.py and run on a Windows box with Notepad installed.

dump_tree.py

30-50x

“Performance improvement: ~30-50x faster for large trees (e.g., 6.5s -> 200ms for 245 elements)”

Comment above build_tree_with_cache at crates/terminator/src/platforms/windows/tree_builder.rs, line 386

Where the data actually comes from

When you call desktop.get_window_tree(pid) from Python, the PyO3 binding at packages/terminator-python/src/desktop.rs line 384 forwards straight into the Rust core. Rust builds one UIA CacheRequest, adds seven properties, sets the tree scope to Subtree, and issues a single find_first_build_cache. Everything below the hub on this diagram is already paid for by the time Python sees the result.

desktop.get_window_tree(pid) end to end

pywinauto versus Terminator, same job

The two scripts below do the same thing: dump the Notepad window as a list of controls. One is written in Python and walks COM. The other is written in Python and lets Rust walk COM.

Two scripts, same goal, two orders of magnitude

# pywinauto, the traditional path.
from pywinauto import Application

app = Application(backend="uia").start("notepad.exe")
dlg = app.window(title_re=".*Notepad")

# Every property read is a COM call from Python.
for child in dlg.descendants():
    print(child.element_info.control_type,
          child.element_info.name,
          child.element_info.automation_id,
          child.rectangle())

# 245 elements x ~15 COM calls = ~3,675 round trips.
# Measured on the same shape of tree: ~6.5 seconds.

13% fewer lines

What a Python run looks like, before vs after

You open the app. You walk descendants(). You print each control. Then you write a second script that loops over the dump and picks controls by substring. The agent layer is a text parser you wrote yourself.

One COM round trip per property per element
A 245-element dialog takes ~6.5 seconds to walk
Every project writes its own ad hoc JSON serializer
Switching Windows versions breaks the parser

Selectors you can hand to an LLM

Every UINode carries an AutomationId where the app publishes one. That is gold: selectors anchored on AutomationId outlive layout and label changes. These are the shapes Terminator understands, in the exact string form a model should return.

role:buttonname:Savenativeid:CalculatorResultsrole:edit|name:Addressrole:menuitem|name:Filerole:tab|name:Inboxautomationid:StartButton

Chain with the pipe character: role:edit|name:Address. Prefix a selector with nativeid: to pin to an AutomationId. The same strings work from Python, TypeScript, and the MCP server.

The agent loop, in one file

Enough theory. Here is a working Python script that asks Claude to pick a selector from the window tree and then clicks it. Under 30 lines. No framework, no runner, just the Anthropic SDK plus terminator-py.

click_whatever.py

Watch the call in three frames

From Python to pretty JSON

01 / 03

Frame 1: the entry point

Python calls desktop.get_window_tree(pid). The PyO3 wrapper at packages/terminator-python/src/desktop.rs:384 releases the GIL and hands the call to Rust.

Three numbers worth memorizing

Python call to retrieve an entire Windows UI tree, regardless of how many elements live inside.

UIA properties pre-fetched per node: ControlType, Name, BoundingRectangle, IsEnabled, IsKeyboardFocusable, HasKeyboardFocus, AutomationId.

0 ms

Cached walk of a 245-element window. Same tree without caching: ~6.5 seconds.

Terminator's Python path versus pywinauto

Feature	pywinauto / pyautogui	terminator-py
One call returns the full window tree	No, you iterate descendants() in Python	desktop.get_window_tree(pid) -> UINode
Tree serializes to JSON	Write your own recursion	str(tree) is pretty JSON, LLM-ready
Walk happens in Rust	Every read is a Python -> COM hop	PyO3 binding drops into native tree_builder
Wall-clock for a 245-element window	~6.5 seconds (documented shape)	~200 ms, verified in tree_builder.rs line 386
Same script on macOS and Linux	Windows only (pywinauto, pygetwindow)	Yes, Desktop/Locator API is cross-platform
Selector syntax	Backend-specific (uia vs win32)	role:, name:, nativeid:, chainable with \|
Made for AI coding assistants	No, the UIA tree stays in your process	Same primitive is exposed as MCP get_window_tree

Want a Python script that hands your Windows desktop to Claude?

Book 20 minutes and we will wire terminator-py into your workflow on a real Windows app of your choice.

Frequently asked questions

What package do I install for Python automation on Windows with Terminator?

pip install terminator-py. The package name on PyPI is terminator-py (with a hyphen), but the import is `import terminator`. Wheels are published for Windows on Python 3.10, 3.11, and 3.12. The project metadata lives at packages/terminator-python/pyproject.toml in the Terminator repo. The binding itself is a PyO3 extension module, so every call from Python drops into Rust instead of running a Python loop against the Windows COM API.

What does desktop.get_window_tree(pid) actually return?

A UINode, Terminator's Python class for a node in the UI Automation tree. Each UINode has an id, an UIElementAttributes block (role, name, label, value, description, properties, is_keyboard_focusable, bounds), and a list of child UINodes. The class is defined in packages/terminator-python/src/types.rs. Its __str__ method calls serde_json::to_string_pretty on the whole subtree, so print(tree) emits valid JSON you can send to any language model or save to disk.

Why is this faster than walking the tree yourself with pywinauto?

pywinauto walks Windows UI Automation from Python, one property at a time. Every control.control_type, control.texts(), control.rectangle() is a separate COM round trip from your Python process to the target app's process. On a 245-element dialog, that is around 3,675 cross-process calls. Terminator does the walk inside Rust with a single IUIAutomationCacheRequest that sets TreeScope::Subtree and pre-fetches 7 properties in one call. The function is build_tree_with_cache at crates/terminator/src/platforms/windows/tree_builder.rs line 388. The comment above it reads: Performance improvement: ~30-50x faster for large trees (e.g., 6.5s to 200ms for 245 elements).

Which UIA properties come back on every node?

Seven, in order: ControlType, Name, BoundingRectangle, IsEnabled, IsKeyboardFocusable, HasKeyboardFocus, AutomationId. The list is hardcoded in the cache_request block at tree_builder.rs around line 402. Every UINode you get back in Python exposes these via the attributes field, and the whole subtree is built in one find_first_build_cache call instead of one call per property per element.

Can I use the same Python script on macOS and Linux?

Mostly. The Desktop, Locator, and UIElement classes have the same shape on every platform, so desktop.open_application, locator('role:button').first(), click(), and type_text() port without changes. What does not port is platform-specific selectors (nativeid:CalculatorResults is a Windows AutomationId, the macOS equivalent reads AXIdentifier) and the get_window_tree PID path (macOS AX is PID-based too but the tree shape differs). If you write scripts around role and name selectors, they run unchanged. If you hard-code AutomationId, you are writing Python automation on Windows specifically, and that is fine.

How do I find the PID to pass to get_window_tree?

Two paths. If you already have a UIElement (for example, from desktop.open_application or desktop.application('Notepad')), call element.process_id() to get its PID as an int. If you want to start fresh, desktop.applications() returns a list of UIElement, one per running app, and each knows its PID. Once you have the PID, tree = desktop.get_window_tree(pid) returns the full subtree rooted at the main window. Pass an optional title argument to disambiguate when an app has multiple windows.

Does the JSON that comes out of str(tree) plug into an LLM directly?

Yes. The serialization is stable and schema-like. Each node has id, attributes (role, name, bounds, etc.), and children (recursive). Typical usage is tree_json = str(desktop.get_window_tree(pid)), then feed that into a system prompt asking the model to return a selector for the element you want to click. Because every node carries AutomationId where available, the model can write selectors like nativeid:SaveButton that survive layout changes. That is the pattern Terminator's MCP server already uses under the hood, exposed through the get_window_tree tool so Claude Code and Cursor get the same primitive without writing Python at all.

What does Fast vs Complete vs Smart do on TreeBuildConfig?

The PropertyLoadingMode on TreeBuildConfig trades completeness for speed. Fast (the default) runs the cached walk with the seven baseline UIA properties. Complete additionally pulls heavier fields on demand. Smart adapts based on the element type. Defined in packages/terminator-python/src/types.rs at the PropertyLoadingMode impl around line 609. For agent use cases, Fast is almost always the right call: the seven baseline properties are enough for an LLM to generate selectors.