M
Matthew Diakonov
13 min read

Open source desktop automation projects, April 2026: which ones still ship, which ones speak MCP, and the only one with a real selector grammar

Most guides on this topic re-list the same six projects from 2010 and call it a day. That is not an accurate map of what is shippable in April 2026. The interesting question is no longer “what scripts the keyboard”, it is “what hands an AI coding assistant a typed contract for the entire OS”. This guide walks the open source projects that are still active, groups them by what they actually expose to an agent, and ends on the one project that ships a real selector grammar plus a Model Context Protocol server. The grammar lives in a single file, 753 lines, MIT licensed.

4.9from 86 versioned releases, 35 MCP tools, 26 Selector variants, all MIT
Hand-written Shunting Yard parser, crates/terminator/src/selector.rs
Spatial selectors: rightof, leftof, above, below, near
MCP server installs with one npx command

The projects in active circulation

Before grouping them, here is the cast. Every name below is an open source project that still has users and at least an occasional release in 2025 or 2026. Some are decades old. One is two years old. They are all real.

AutoHotkeyAutoItSikuliXPyAutoGUIpywinautoRobot FrameworkRPA FrameworkdogtailxdotoolAT-SPI2WinAppDriverAppium WindowsTerminator

Not exhaustive. RPA studios (Power Automate Desktop, UiPath, Blue Prism) are excluded because they are commercial. Selenium and Playwright are excluded because they automate browsers, not desktops.

Four eras of open source desktop automation

The cleanest way to make sense of this list is to sort by what each project hands to the caller. There are four eras. They overlap. Most existing guides flatten all four into a single bullet list, which is why those guides feel interchangeable.

Era 1: scripting languages (1998 onward)

AutoHotkey, AutoIt, sometimes Sikuli's older Jython runtime. You write a hotkey or a Send command and the OS replays it. Fast, ubiquitous, and they will probably outlive everything below them. They give an AI agent almost no structured information about the screen.

Era 2: pixel and OCR (2010 onward)

SikuliX, PyAutoGUI, image-template bots. Match a saved image, click its center. Lovely until a theme changes. No selector grammar to speak of.

Era 3: accessibility wrappers (2015 onward)

pywinauto, dogtail, Linux AT-SPI2 wrappers, the old WinAppDriver. They expose the OS accessibility tree through a builder API. No parser, no boolean operators, often no spatial selectors.

Era 4: MCP-native, parser-backed

Terminator. Accessibility tree plus a real selector grammar plus a Model Context Protocol server. You hand an AI assistant a string and it walks the OS. April 2026 is the first year this category is shippable, MIT, and not a research demo.

RPA studios are not on this list

Power Automate Desktop, UiPath Studio, Blue Prism, and Automation Anywhere are not open source desktop automation projects, no matter how the marketing reads. They are commercial flowchart builders. Useful for human-driven processes, not the right shape for an AI coding assistant in a loop.

Robot Framework deserves a separate note

Robot Framework is open source and active, but it is a keyword-driven test runner, not a desktop automation framework. Pair it with the RPA Framework libraries (rpaframework.org) and you get something useful, with most of the same screenshot caveats as SikuliX once you leave the browser.

How we got here

The shape of the field follows the shape of what was fast on commodity hardware. Each era replaces the previous one not because the older tools stopped working, but because a new primitive became cheap enough to wrap.

1

1998 to 2010: keyboard-and-mouse scripting

AutoHotkey (1998) and AutoIt (1999) define the genre. You bind a key, you send a string, you script Windows like it is a typewriter. Reliable for a single workflow, opaque to anything that needs to read the screen.

2

2010 to 2020: pixels and image templates

Sikuli, then SikuliX. PyAutoGUI in 2014. Macro recorders inside RPA studios. They work because GPUs got fast and screenshots got cheap. They break the moment Windows changes a theme.

3

2018 to 2024: accessibility wrappers go mainstream

pywinauto picks up users. dogtail and AT-SPI2 stabilize on Linux. WinAppDriver tries to be Selenium for Windows. The accessibility tree is finally first-class on Windows and macOS, but every project exposes it through a different builder API.

4

2024 onward: MCP and parser-backed selectors

Anthropic ships Model Context Protocol. AI assistants suddenly need a stable, typed contract for desktop tools. Terminator publishes an MCP server with 35 tools and a 753-line selector parser. April 2026 is the first month where you can install this category in one command.

753 LOC

The selector parser is one file. 753 lines. Six tokens, three precedence levels, 26 Selector variants. That is the spine of an AI-driven desktop automation workflow in 2026.

crates/terminator/src/selector.rs

The anchor: a real selector grammar for the desktop

Across every project on the chip strip above, exactly one ships a parser for selector expressions. AutoHotkey has a hotkey DSL, but it does not query the accessibility tree. AutoIt has window titles and control IDs, but no boolean composition. SikuliX matches images, not symbolic locators. PyAutoGUI matches images. pywinauto exposes a Python builder (chained method calls), which is fine for Python but cannot be serialized to a string and handed to an AI assistant over a wire protocol. Robot Framework is keyword-driven; the keywords delegate to libraries that, again, do not parse a grammar.

Terminator's grammar is parsed by a single 753-line Rust file at crates/terminator/src/selector.rs. The pipeline is the textbook one for boolean expressions: tokenize the input, run Shunting Yard with operator precedence, fold operators into AST nodes. The interesting part is how few moving parts it has.

crates/terminator/src/selector.rs

Three precedence levels (NOT=3, AND=2, OR=1) and 26 Selector variants are enough to describe almost every locator a UI agent needs to express. The grammar cooperates with the chain operator >> and the parent operator .., both borrowed from Playwright's vocabulary. Spatial selectors are unique to desktop work, because the screen is two-dimensional in a way the DOM is not.

What you can actually write

The grammar earns its keep when you watch how compact a real locator becomes. Each line below parses against the file above and matches a single element on a Windows desktop. Most of these would be a paragraph of code in any of the older projects.

selector strings

The last expression is the one I think pays for the whole grammar. process:chrome scopes the lookup to a single browser process. >> walks one level into the descendants. role:Button && !name:Cancel matches every button that is not the Cancel button. Try writing that in image-template form. You can't. Try writing it in pywinauto's builder API. You can, but it takes three lines and your agent has to know it.

0Lines in selector.rs (parser + types)
0Selector enum variants
0Tools in the MCP server
0Versioned releases by 2026-01-13

Side by side: the only one of these projects an AI agent can drive over MCP

Pulling the row count down to the questions that matter for an agent in 2026: how does it see the screen, what does the selector grammar look like, is there a Model Context Protocol server, and how active is the project this year.

FeatureTypical open source desktop automation projectTerminator
How it sees the screenPixels, OCR, or coordinates (SikuliX, PyAutoGUI)Native accessibility tree via Windows UI Automation
Selector grammarNone, or a Python builder API (pywinauto)Hand-written Shunting Yard parser, 26 Selector variants
Boolean operators in selectorsNot supported&&, ||, !, parens with NOT=3 AND=2 OR=1 precedence
Spatial selectorsNot supportedrightof:, leftof:, above:, below:, near:
Model Context Protocol serverNone of the older projects ship oneterminator-mcp-agent with 35 tools, npm-installable
Engineering activity in 2026Mostly maintenance releases or dormantFour releases in the first 13 days of January 2026
LicenseMixed (GPL, MIT, BSD; pywinauto BSD; AHK GPL2)MIT, mediar-ai/terminator on GitHub
Surface for AI coding assistantsWrap manually in your own MCP shimNative MCP, one-liner install in Cursor / Claude Code / VS Code

Verify it yourself before you install anything

Three of the four claims in this guide are checkable in under a minute against the public repo. The fourth (MCP tool count) takes one more grep. Here is the shell session in full, so you can confirm before you trust the prose.

Verify selector parser, release cadence, and MCP tool count

If those numbers do not line up against the latest main, the project moved faster than this guide did, and the conclusion still holds. Terminator's release cadence is the second piece of evidence in this guide. The first is the file path.

What the older projects are still good for

None of this is a takedown of AutoHotkey or SikuliX. Those projects are excellent at the jobs they were designed for, and they predate Model Context Protocol by 25 and 15 years respectively. AutoHotkey is the right answer when you want a global hotkey and a fast Send command. SikuliX is the right answer when the target software is a legacy Win32 control with no accessibility surface and a stable visual layout. PyAutoGUI is the right answer when you want a 200-line Python script that nudges a mouse cursor.

The shift in 2026 is that the typical job has changed. The interesting unit of automation is no longer “a one-off script a human runs”, it is “an AI assistant that drives the desktop in a loop”. For that job, the grammar matters more than any single primitive. A Send command is not a contract. A typed selector string is.

The takeaway

If you are picking an open source desktop automation project in April 2026 because a human will run scripts, AutoHotkey, AutoIt, or PyAutoGUI is fine. If you are picking because an AI coding assistant has to drive the desktop in a loop, you want three things the older projects do not give you: a structured view of the screen via the accessibility tree, a typed selector grammar your assistant can produce as a string, and a Model Context Protocol server it can call directly. Those three things meet in exactly one project on this list. The grammar is in one file, the file is 753 lines, and it is MIT licensed at mediar-ai/terminator.

Want to see the selector grammar driving your own Windows app?

Hop on a call and we will run Terminator against the exact workflow your team is trying to automate, with the locator strings on screen.

Frequently asked questions

Which open source desktop automation projects are still actively maintained in April 2026?

AutoHotkey is still maintained as a scripting language for Windows. AutoIt is still around but its community has shrunk. SikuliX has had infrequent releases. PyAutoGUI receives occasional fixes. Robot Framework and the RPA Framework ecosystem keep shipping. pywinauto is alive but slow-moving. The clearest spike of recent engineering activity is Terminator, whose CHANGELOG.md shows four releases in the first thirteen days of January 2026 (0.24.16, 0.24.18, 0.24.19, 0.24.20) and 86 total versioned releases by then. The activity gap matters because Model Context Protocol shifted desktop automation between 2024 and 2026, and the older projects have not adapted.

What does Terminator do that the older open source projects do not?

Three things. First, it uses the native accessibility tree (Windows UI Automation) rather than pixels or screenshots, so locators stay stable across themes and DPI changes. Second, it ships a Model Context Protocol server with 35 tools (counted by `grep -c '#[tool('` in crates/terminator-mcp-agent/src/server.rs at the 2026-01-13 release), so Claude Code, Cursor, VS Code, and Windsurf can drive your desktop the same way they drive a browser today. Third, it ships a real selector grammar. AutoHotkey and AutoIt do not have one. PyAutoGUI does not have one. pywinauto has a Python builder but no parser. Terminator's parser is at crates/terminator/src/selector.rs, 753 lines, hand-written Shunting Yard with operator precedence and parentheses.

What does that selector grammar actually look like?

It is the closest thing to Playwright for the desktop. You can write `process:notepad >> role:Edit` to scope an Edit control to the Notepad process. You can write `role:Button && name:Save && !classname:Disabled` to combine attributes with logical operators. You can navigate spatially with `rightof:(role:Tab && name:Settings)` or `below:role:Toolbar`. You can use `||` for OR, `&&` for AND, `!` for NOT, `>>` for descendant chains, and `..` to walk back to the parent. The Selector enum at crates/terminator/src/selector.rs has 26 variants, including Role, Id, Name, Text, Path, NativeId, Attributes, Filter, Chain, ClassName, Visible, LocalizedRole, Process, RightOf, LeftOf, Above, Below, Near, Nth, Has, Parent, And, Or, Not, and Invalid.

Where exactly does the parser live and how does it work?

It lives in one file: crates/terminator/src/selector.rs. The pipeline is: tokenize() walks the input character by character producing a Vec of Token (Selector, And, Or, Not, LParen, RParen), then parse_boolean_expression() runs a Shunting Yard algorithm with operator_precedence() returning NOT=3, AND=2, OR=1. apply_operator() pops operands and folds them into Selector::And, Selector::Or, or Selector::Not nodes. text: selectors get special handling so they can contain colons and parentheses without breaking the tokenizer. Spatial selectors like rightof:, leftof:, above:, below:, and near: are parsed in parse_atomic_selector() around line 419.

How do screenshot-based projects like SikuliX or PyAutoGUI compare on stability?

They do real work and they have a place. The cost is that pixel matching breaks on theme changes, font changes, DPI scaling, and any animation that runs while you grab the screen. If your automation has to survive a Windows update or a user toggling dark mode, an accessibility-tree-based tool will keep working without re-recording image templates. The trade-off is that the accessibility-tree projects depend on the target app exposing UIA correctly. Most modern Windows apps do. Older Win32 software sometimes does not, and that is where SikuliX or PyAutoGUI still win.

Why does Model Context Protocol matter for desktop automation specifically?

MCP standardizes how AI coding assistants call external tools. In 2024 every assistant invented its own bridge to its own browser harness. By 2026, MCP is the lingua franca: Cursor, Claude Code, VS Code, and Windsurf all speak it. A desktop automation project that ships an MCP server gives every one of those assistants the ability to drive Windows in a loop, the same way they drive Playwright. Terminator's MCP server is published as the npm package `terminator-mcp-agent` and you wire it up with one command: `claude mcp add terminator "npx -y terminator-mcp-agent@latest"`. None of the older open source projects ship an MCP server.

Is Terminator really MIT licensed and is everything I have described in the public repo?

Yes. The repo is mediar-ai/terminator on GitHub. Selector parser, MCP agent, Rust core, Node.js bindings, Python bindings, workflow recorder, and CLI are all MIT. To verify the selector parser yourself, clone the repo, then `wc -l crates/terminator/src/selector.rs` will show 753 lines, `grep -c "#\[tool(" crates/terminator-mcp-agent/src/server.rs` will show 35, and `grep -c "^## \[" CHANGELOG.md` will show 86 release headings as of the 0.24.20 release on 2026-01-13.

What about Linux and macOS support?

Open source desktop automation on Linux is dominated by AT-SPI2 wrappers and dogtail, which is unmaintained but still functional. xdotool is the universal coordinate-based fallback and is maintained. On macOS, the cross-platform projects (PyAutoGUI, SikuliX, Robot Framework) work with the usual screenshot caveats. Terminator's core Rust crate has macOS Accessibility API support at the platform layer, but its npm and pip distributions ship Windows binaries today; the README is explicit about that. Pick your tool based on which OS your automation actually has to run on, not which OS the project page says it supports.

Do I need to learn the selector grammar or can I record workflows instead?

You can record. Terminator includes the terminator-workflow-recorder crate which captures human workflows as deterministic YAML you can replay. Many users record once, then edit the YAML by hand later when something needs to change. The selector grammar is what the recorded workflow ends up using under the covers, so if you ever want to debug or tighten a recorded step, knowing the grammar pays off. AutoHotkey and AutoIt have no recorder; you write scripts. SikuliX has a record-and-replay UI but it is image-based, not tree-based.

If I am building an AI agent that controls a desktop in 2026, where should I start?

Start with the smallest scope that proves the loop works. Pick one app and one workflow you can describe in three sentences. If your agent already speaks MCP, install Terminator's MCP agent (one command). Wire the agent to call get_window_tree, then click_element with a selector built from `role:` and `name:`. When that round-trip works against one app, generalize. The selector grammar will start to repay itself the moment you need a workflow that spans two windows or that has to find an element by negation ("the only enabled Save button that is not in the toolbar"). That is also the moment screenshot-based tools fall over.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.