Robotic desktop automation software, graded on the selector engine instead of the Studio
Every guide to robotic desktop automation software grades the same shortlist on the same axis: drag-and-drop builder, AI Flow Generation, recorder UX, cross-platform, license. None of those graders open the source and look at what the bot actually uses to find a button on the screen. This page is about the part of a desktop automation framework that decides whether your workflow still works in week three: the selector engine. Terminator's lives in a single file, has 24 typed variants, composes through five string operators, and is exposed to AI coding assistants over MCP.
Why every roundup of robotic desktop automation software stops at the Studio screenshot
Open any list of robotic desktop automation software and you get the same grid: UiPath, Power Automate Desktop, Automation Anywhere, NICE, Blue Prism, Fortra Automate Desktop, Robomotion, AutoHotkey, Robocorp. Each row gets graded on the same five things, in roughly this order: visual builder, AI assist, image recognition, cross-platform, license model. The screenshot at the top of each comparison shows a flowchart canvas with rectangles labeled “Click” and “Type Into”. The reader nods, picks one, installs it.
Two weeks later the workflow fails. Nobody changed the application. Nobody edited the bot. The button is still there, the form is still there. But the recorded selector embedded a string that has changed since the recording: a window title, a relative timestamp, a generated AutomationId. The Studio doesn't care, because at evaluation time the Studio only had to look pretty. The grading rubric never asked about the part of the bot that does the actual finding.
That part is the selector engine. It's the small piece of code that takes a string like role:Button && name:Save and returns the actual UI element on screen. It's where bot durability is decided. And it's the part that vendor sites publish least, because it has no screenshot. Terminator's is in one file you can read in eight minutes.
The anchor: a 24-variant Rust enum at crates/terminator/src/selector.rs
Lines 4 through 56. One file. One enum. Twenty-four active variants (the twenty-fifth, Invalid, holds a parse error string and is never resolved). The variants split into five groups: by-attribute, positional, boolean, tree-navigation, and special-case. The engine resolves them against the Windows UI Automation tree. The same enum is constructed three different ways: humans write strings, programs call factory methods, AI coding assistants generate strings through MCP. All three paths converge on the same enum.
Why an enum? Because each variant carries different data. Role carries a struct of role and optional name. RightOf carries a boxed inner Selector to use as the spatial anchor. And carries a vector of selectors. The Rust enum is the schema; the string DSL is just one serialization. A naive RPA Studio that stores selectors as strings is lossy by definition; Terminator stores them as the typed enum and renders strings on demand.
What the 24 variants actually do
The grouping below mirrors the source order. Every variant has a one-line doc comment in selector.rs; the descriptions here paraphrase those comments and show how the variant gets used in real automation code.
By attribute
Six variants: Role, Id, Name, Text, NativeId, ClassName. The bread and butter. Role+Name handles 70% of clicks. NativeId is the AutomationId on Windows. ClassName falls back to the WinForms or WPF class when accessibility metadata is sparse. All six are written with the same prefix syntax in selector strings: role:Button, name:Save, id:submit_btn.
Positional, anchor-relative
Five variants: RightOf, LeftOf, Above, Below, Near. Each takes a Box<Selector> as its anchor. The engine resolves the anchor first, then filters candidates by their bounding rectangles relative to it. This is what makes label-and-input forms automatable without committing to AutomationId values that change between releases.
Boolean composition
Three variants: And, Or, Not. Written in strings as &&, ||, !. Both And and Or take Vec<Selector>. Not takes a single boxed selector. The expression role:Button && !name:Cancel matches buttons that are not Cancel. The expression name:Save || name:Submit matches either label. The recursive enum lets you nest the same operators arbitrarily.
Tree navigation
Three variants: Has, Parent, Chain. Has is Playwright-style :has(): match elements with at least one descendant matching the inner selector. Parent walks up one level (the .. operator). Chain joins selectors left-to-right with the descendant relationship (the >> operator). Together they let you write expressions like role:DataItem >> has:(role:Button && name:Delete) >> ..
Scope and special cases
Seven variants: Path, Attributes, Filter, Visible, LocalizedRole, Process, Nth. Process: pins the search to a single application's window tree, which is required for any production automation. Visible:true filters off-screen elements. Nth picks the n-th match. LocalizedRole handles non-English Windows installs where roles arrive translated. Filter holds a closure ID for predicates the string DSL can't express.
Five string operators
Selectors compile from strings. && for And, || for Or, ! for Not, >> for descendant chaining (Chain), .. for Parent. Anything else is a leaf, parsed by prefix. The grammar is permissive about whitespace and recursive: process:saplogon >> (role:Edit && (rightof:(role:Text && name:'Customer ID') || rightof:(role:Text && name:'Account #'))) && !classname:ReadOnlyEdit is one valid string.
The five string operators that compose them
String selectors are not just the leaves. The DSL has a real parser that tokenizes &&, ||, !, >>, and .. with respect to parenthesization, then folds the result into the typed enum. Below is what an AI coding assistant typically emits when it has read the live accessibility tree of a SAP form and needs to find an input next to a label. Note the mixture of process scoping, descendant chaining, positional anchoring, and boolean negation in one expression.
How the engine pipes a string into a click
Every selector resolution flows through the same hub. The string DSL or programmatic constructor produces a Selector enum. The locator engine walks the Windows UI Automation tree, scoped by Process, depth-limited and yield-aware via the TreeBuildConfig at platforms/mod.rs. A matched UIElement comes back, the action runs through the OS accessibility interface, and the cursor never moves. The MCP server sits in front of the whole pipeline so an AI coding assistant can drive it.
Selector resolution, end to end
The same job, written two ways
Below: a sketch of how a screenshot-and-AutomationId-driven RPA tool persists a single click on a SAP form versus what Terminator persists for the same click. The left side embeds two PNGs, one image confidence value, one offset in pixels, one generated AutomationId, and one fragile parent window title. The right side is one chained selector string and one.typeText() call.
Same SAP click, two persisted forms
# UiPath / Power Automate / Automation Anywhere style.
# What gets persisted is roughly this.
<Activity Click x:Name="ClickCustomerID">
<Target>
<ScreenshotPath>screens/sap_form_47.png</ScreenshotPath>
<Anchor>
<Image>screens/customer_id_label.png</Image>
<Confidence>0.85</Confidence>
</Anchor>
<OffsetFromAnchor x="120" y="2" />
<ClickType>Single</ClickType>
</Target>
<Selector>
aaname='Edit5'
automationid='ctl00_ctl01_ctl47_TextBox'
parent_aaname='Sales Order Create'
</Selector>
</Activity>
<!-- Brittle: image hashes change at every theme tweak.
The AutomationId is a generated string that flips
every release. The selector is unreadable. -->What an AI coding assistant sees on the wire
The MCP server at crates/terminator-mcp-agent/src/server.rs registers 35 tools through the rmcp tool_router macro. From the assistant's perspective the desktop is a JSON API. It calls get_window_tree to discover the live accessibility tree, generates a selector string against that tree, optionally calls typecheck_workflow to run tsc --noEmit against the workflow directory before any UI thread is touched, then drives the click. The whole loop is a few hundred milliseconds.
MCP request flow: AI assistant to Windows UIA
“Pre-trained deterministic workflows with AI recovery only when needed. The framework uses structured accessibility APIs, not pixel matching. This makes it 100x faster (CPU speed, not LLM inference), more reliable (>95% success rate), and deterministic.”
terminator/llms.txt, the project's official agent guide
How it compares row by row to the Studio category
The comparison below is structural, not a feature checklist. The categories don't target the same buyer. The Studio category sells to RPA centers of excellence; Terminator targets a developer or an AI coding assistant already living in a code editor. The rows are the things that cease to be opinions and start to be load-bearing once you put the bot in production.
Terminator vs. Studio-driven robotic desktop automation software
| Feature | Studio-driven RPA | Terminator |
|---|---|---|
| Authoring surface | Visual Studio app (drag-and-drop flowcharts) | Rust crate, npm package, pip package, MCP server |
| Selector primitives | Mostly AutomationId + recorded coordinates | 24-variant typed enum: attribute, positional, boolean, structural |
| Composition operators | Sequencing inside the flowchart | &&, ||, !, >>, .. as string operators |
| Source of truth | Proprietary artifact in the Studio | YAML or TypeScript file you diff in git |
| License model | Per-bot or per-developer commercial | MIT, source on GitHub |
| AI agent integration | Bot designer copilot, no MCP | 35 MCP tools incl. typecheck_workflow |
| Static checks before run | Designer-side validation only | tsc --noEmit on the workflow directory via MCP |
| Runs in the background | Often takes over cursor and keyboard | Driven through accessibility API, cursor stays free |
Setup: from npx to a typechecked automation in six steps
One Windows host, one MCP-aware editor, six commands. Steps 4 and 5 are the ones the rest of the category genuinely doesn't have an equivalent of: programmatic selector composition against the live tree, plus a static typecheck of the generated code before the robot runs.
From install to typechecked replay
Install the MCP agent
Run npx -y terminator-mcp-agent@latest --version on a Windows host. The same npx command, without --version, is the entry point your AI coding assistant calls.
Wire it into the AI assistant
claude mcp add terminator "npx -y terminator-mcp-agent@latest" for Claude Code. For Cursor, VS Code, and Windsurf the equivalent JSON block goes in the IDE's MCP servers config: { command: "npx", args: ["-y", "terminator-mcp-agent@latest"] }.
Read the live accessibility tree
Ask the assistant to call get_window_tree on whatever app you want to automate. The result is a JSON tree of UINode objects with role, name, AutomationId, and bounds for each element. This is the surface the selector engine resolves against.
Generate selectors against that tree
From the tree the assistant writes Terminator-shape selector strings. Because the strings are recursive and prefix-based, the assistant can compose them by concatenation without a parser; the Rust side validates and rejects malformed expressions with a typed Invalid variant.
Typecheck before running
Call typecheck_workflow with the path to the generated workflow directory. The MCP tool runs tsc --noEmit, parses the output with a regex into TypeError { file, line, column, code, message }, and returns the structured list. Errors get fixed before the robot ever touches the desktop.
Replay deterministically
terminator mcp run workflow.yml replays the saved workflow. Flags --dry-run, --start-from, and --end-at let you isolate a single step. Because the source is YAML or TypeScript, you bisect failures with normal git tools, not by clicking around inside a Studio.
One real constraint to know about
Terminator only ships Windows binaries. The platforms module at crates/terminator/src/platforms/mod.rs ends with a hard compile_error! at lines 318-320 declaring the framework Windows-only. Earlier branches contained a macOS Accessibility API adapter and a Linux AT-SPI2 adapter; both are currently disabled at compile time so the maintained surface stays small. If your robotic desktop automation software needs to drive a Mac or a GNOME desktop, this is not the framework. If your automation lives in Windows-shaped enterprise stacks (which is most of the RPA market) this constraint never bites.
Walk through your bot's selector engine with us
Bring a workflow that breaks every two weeks. We will rewrite it on Terminator's selector engine and show you why.
Frequently asked questions
What is robotic desktop automation software?
Robotic desktop automation software is anything that drives the applications on a single user's desktop the way a human would: clicking buttons in Excel, copying values out of an SAP form, opening Outlook, navigating a custom WPF tool, alt-tabbing between windows. It descends from enterprise RPA, where each user runs a personal 'robot' on their own machine instead of a shared server-side bot. The robot needs three things: a way to enumerate UI elements, a way to identify the right one, and a way to act on it. The classical RPA suites (UiPath, Power Automate Desktop, Automation Anywhere, Fortra Automate Desktop) ship a visual Studio where you assemble those three things into a flowchart. Terminator ships them as a Rust crate, an npm package, a pip package, and an MCP server. Same job, different shape.
How is Terminator different from UiPath, Power Automate Desktop, and Automation Anywhere?
Different category. UiPath, Power Automate Desktop, Automation Anywhere, NICE, Blue Prism, and Fortra Automate Desktop are designer-driven RPA platforms. The robot is a process you build inside a visual Studio, save as a proprietary artifact, and ship to a runtime. The intended user is a citizen developer or RPA developer, not a software engineer. Terminator inverts every one of those defaults. It is a developer framework, MIT licensed, distributed as cargo add terminator-rs, npm install @mediar-ai/terminator, pip install terminator-py, or npx -y terminator-mcp-agent@latest. There is no Studio. The robot is a library your code, or your AI coding assistant's code, imports and calls. Workflows are YAML or TypeScript files you can diff in git.
What does the selector engine actually look like?
It is a Rust enum with 24 active variants, defined at crates/terminator/src/selector.rs lines 4-56. Six identify by attribute (Role, Id, Name, Text, NativeId, ClassName). Five locate by spatial relationship to an anchor element (RightOf, LeftOf, Above, Below, Near). Three compose with boolean logic (And, Or, Not). Three navigate the tree (Has, Parent, Chain). Seven handle special cases (Path, Attributes, Filter, Visible, LocalizedRole, Process, Nth). At runtime the engine resolves them against the Windows UI Automation accessibility tree returned by IUIAutomationElement queries. The same enum can be written by humans as a string (process:chrome >> role:Button && name:Save), built programmatically from the Selector::role / Selector::name factory methods, or generated by an AI coding assistant from natural language through the MCP server.
Why does positional selection (rightof, leftof, above, below, near) matter on the desktop?
Forms are the dominant UI pattern in the apps that need automation: SAP, Salesforce desktop clients, ERP systems, hospital information systems, accounting software. They have rows of labels next to text inputs with no accessible name on the input itself. The accessibility tree shows you a Label 'Customer ID' and an unnamed Edit. Without positional selectors you fall back to AutomationId or to walking by index. Both are brittle. With Terminator you write role:Edit && rightof:(role:Text && name:Customer ID) and the engine finds the input regardless of where the form lays out, regardless of whether the row is the first or the seventeenth, regardless of whether the underlying control gets a new AutomationId in the next release. This is the same primitive Playwright introduced for the DOM in 2021 and that desktop automation tools have been slow to adopt.
What are the 35 MCP tools the agent exposes?
The MCP agent at crates/terminator-mcp-agent/src/server.rs registers 35 tools via the rmcp tool_router macro. The element-driving group includes click_element, activate_element, validate_element, navigate_browser, open_application, get_window_tree, get_applications_and_windows_list, and execute_sequence. The file-system group includes read_file, write_file, edit_file, copy_content, glob_files, grep_files. The standout is typecheck_workflow: it accepts a workflow_path, runs tsc --noEmit against the directory, parses the tsc output with a regex into TypeError objects with file, line, column, code, and message fields, and returns them. Your AI coding assistant can typecheck the automation script before any UI thread is touched.
Can Terminator run on macOS or Linux?
Not in the current release. The platforms module at crates/terminator/src/platforms/mod.rs ends with cfg(not(target_os = "windows")) compile_error!("Terminator only supports Windows. Linux and macOS are not supported.") at lines 318-320. Earlier versions had partial macOS Accessibility API and Linux AT-SPI2 support; that code is currently disabled at compile time so the maintained surface stays small. If you need a robot for the Mac or for GNOME, this is not the framework. If your robotic desktop automation software requirements are Windows-shaped, which is most enterprise RPA, that is the trade.
Does Terminator use pixel matching or computer vision?
Not by default. The default detection path is the Windows UI Automation accessibility tree. Pixels and OCR are available for fallback when the accessibility tree is incomplete: an OCR API on the Desktop class, a vision module in crates/terminator/src/computer_use/mod.rs, and a screenshot pipeline. The reliability gain from accessibility-first is roughly two orders of magnitude over screenshot-driven approaches. The llms.txt at the root of the repo claims 100x faster than screenshot-based approaches and >95% success rate, with vision used only for error recovery, not every action. The deeper reason is determinism: the accessibility tree gives you a stable identifier, whereas a model that classifies pixels gives you a fresh probability distribution every run.
How do I install it and what is the smallest possible automation?
On Windows: npx -y terminator-mcp-agent@latest to launch the MCP server, or npm install @mediar-ai/terminator to use the library directly from Node. The shortest meaningful program opens Notepad, locates its edit area, and types: const desktop = new Desktop(); desktop.openApplication('notepad'); const edit = await desktop.locator('process:notepad >> role:Edit').first(5000); await edit.typeText('Hello'). That is one selector chain (process:notepad >> role:Edit), one timeout in milliseconds (the locator API requires it explicitly, no defaults), and one action. From there the same shape scales: chain more selectors, add a positional anchor, compose with && or ||, walk to the parent with .. when the leaf you want is not the leaf you can identify.
What does the AI coding assistant integration look like in practice?
You install the MCP agent in Claude Code, Cursor, VS Code, or Windsurf with claude mcp add terminator "npx -y terminator-mcp-agent@latest" or the equivalent JSON config. The assistant can then call get_window_tree to read the live accessibility tree of any open app, click_element to drive a button, execute_sequence to run a chain of steps, and typecheck_workflow to verify a generated TypeScript automation before it runs. The assistant doesn't generate pseudo-code that hopes a robot exists; it talks to a real robot over MCP and gets structured responses. The robot does not take over the cursor or keyboard, so you can keep using the machine while the assistant drives a background app.