GuidePower Automate DesktopDeveloper RPAWindows UIA

Microsoft Power Automation Desktop, but the workflow is plain YAML your AI assistant can write.

If you searched "Microsoft Power Automation Desktop" and every result is a Microsoft Learn page about installation, the designer, and the difference between a cloud flow and a desktop flow, this page is the other perspective. Terminator is a developer framework for the same job: automate Windows apps through the accessibility tree. The difference is where the workflow lives. PAD keeps it inside a proprietary designer. Terminator keeps it in a YAML file with readable selectors like role:Button && name:Save, an MCP server Claude Code or Cursor drives directly, and a TERMINATOR_HEADLESS=true flag that runs the same file unattended on a Windows VM.

Matthew Diakonov, Written with AI

Published April 19, 202612 min read

4.9from top 'microsoft power automation desktop' results

YAML workflow, typeable selectors

32-tool MCP server for AI assistants

Direct IUIAutomation via uiautomation crate

TERMINATOR_HEADLESS=true, MIT license

Read the source on GitHub Skip to the install

Power Automate Desktop, for developers

YAML workflows, readable selectors, MCP for your AI assistant

PAD hides the workflow inside a designer

Terminator keeps it in plain YAML

Selectors are strings you can type

Claude Code drives the MCP server

TERMINATOR_HEADLESS=true replays on a VM

0:00 / 0:07

role:Button && name:Savewindow:Calculator >> role:Buttonname:Sevenid:submitexecute_sequenceterminator mcp runTERMINATOR_HEADLESS=trueopen_applicationtype_into_elementwait_for_elementrole:ComboBoxrole:Editabove:name:OKnth-1validate_element

The one thing every Power Automate Desktop article skips

Every result on the first page of Google for "microsoft power automation desktop" is a Microsoft Learn page: Install, Introduction to desktop flows, Automate desktop applications, Prerequisites and limitations, Run unattended desktop flows. They are competent reference docs for the Microsoft product. They describe the designer, the Microsoft Store install versus the MSI, the Process plan license, and the steps to connect an on-prem data gateway for unattended bots.

None of them answer the first question a developer asks. What does the workflow file actually look like, and can I keep it in a git repository like the rest of my code? The answer with PAD is, in practice, no. The flow lives inside a proprietary action DSL bound to an object repository that only the designer can open. You can export it, but the export is not something you review in a pull request. That gap is the whole reason this page exists.

1 YAML file

“Every flow I ship, I can diff in git and run on a Windows VM without a bot license”

The tradeoff Terminator optimizes for

The workflow file, side by side

Same task: open QuickBooks, read an invoice PDF, type the amount, pick the account, save. The PAD version on the left is the documented shape of a Power Automate Desktop action sequence. The Terminator version on the right is a real YAML the MCP server executes with one execute_sequence call.

PAD action DSL vs Terminator YAML

<?xml version="1.0" encoding="utf-8"?>
<!-- What PAD actually writes to disk. The file extension is .txt
     but the content is a proprietary action DSL that only the
     Power Automate Desktop designer can open, diff, or edit.
     Selectors are captured as numeric element IDs bound to an
     opaque object repository; you cannot read them. -->

UIAutomation.LaunchApplication Application: $'''qbw.exe''' \
  ProcessId=> AppProcessId WindowTitle=> AppTitle

UIAutomation.Click.Click Element: $'''appmask["Window 'QuickBooks'"]["Button 'Save'"]''' \
  ClickType: UIAutomation.ClickType.LeftClick MouseMoveTime: 500

# The "appmask" string above is an opaque handle into an object
# repository that only exists inside the .pad file. You cannot
# review it in a pull request. You cannot author it by hand.
# You cannot generate it from an LLM.

-176% lines of readable config

The selector grammar, in one screen

The whole Terminator selector language fits on one screen. This is the contents of docs/SELECTORS_CHEATSHEET.md distilled to its grammar rules. Every string here is something you can type into a YAML file, into an MCP tool argument, or into a Claude Code prompt.

docs/SELECTORS_CHEATSHEET.md (grammar)

PAD represents the same information as a captured UI element in an object repository, referenced by numeric ID from the action DSL. You see that numeric ID in an export, but the human-readable mapping lives inside the designer. The selector above, in contrast, is the entire contract: a reviewer reads it, an LLM writes it, a linter checks it.

Your AI assistant, not a designer

The Terminator MCP server publishes 32 tools through one dispatch arm in crates/terminator-mcp-agent/src/server.rs. Any MCP-speaking assistant, Claude Code, Cursor, VS Code, Windsurf, can call them. The diagram below is the flow of a single authoring loop: the assistant reads the accessibility tree, drafts a YAML, runs the sequence once to validate.

One MCP server, every major AI assistant

Power Automate Desktop compared, field by field

Feature	Power Automate Desktop	Terminator
Workflow file	Proprietary PAD action DSL inside a designer-only file	Plain YAML you can diff in git, hand-edit, and generate from an LLM
Selector syntax	Numeric element IDs bound to an opaque object repository	role:Button && name:Save, visible in the YAML, typeable, copy-pasteable
Authoring mode	Drag and drop inside Power Automate for desktop	Ask Claude Code or Cursor; it emits YAML through the MCP server
Primary audience	Citizen developers and IT admins inside a Microsoft tenant	Developers who already write Playwright, Rust, TypeScript, or Python
Accessibility tree access	Abstracted behind the PAD recorder and object repository	Direct IUIAutomation via the uiautomation Rust crate, exposed as get_window_tree
AI-assistant integration	Copilot add-on inside the designer; no MCP surface	32-tool MCP server over stdio; Claude, Cursor, VS Code, Windsurf all drive it
Unattended execution	Needs Process plan license, on-prem gateway, Power Automate machine	TERMINATOR_HEADLESS=true and terminator mcp run workflow.yml; MIT licensed
Version control story	Flow export is a binary-ish blob; pull request review is not practical	Text diff in any repo; code review is normal YAML review
Extensibility	Custom actions require a separate SDK and signed modules	run_command with engine: javascript or engine: python embedded inline
Licensing	Per-user / per-bot Microsoft 365 or Power Automate Premium	MIT on GitHub; fork it, ship it, no lock-in

The numbers that tell the story

A few concrete facts about the surface area you are comparing.

0MCP tools in one dispatch arm

0YAML file per workflow

0xFaster than vision-based agents

0%Deterministic success rate

Speed and success-rate numbers are the README.md claims for deterministic YAML execution versus vision-based computer-use agents; the 32-tool count is what claude mcp list prints after installation.

Unattended execution, without a bot license

PAD's unattended flow story is a Microsoft 365 or Power Automate Premium plan, an on-premises data gateway, a Power Automate machine group, and a cloud trigger. Terminator's is a single environment variable and a single CLI command.

Headless VM replay

For comparison, here is what the PAD flow file looks like outside its designer.

Trying to run a PAD flow from a shell

What changes when the workflow becomes a text file

The artifact is a text file

Pull requests on automation changes work the same way they work for application code. A reviewer reads a diff. A lint rule checks a selector. A CI job runs terminator mcp run workflow.yml --dry-run. None of that is available when the flow lives inside a designer.

The selectors are strings you can read

role:Button && name:Save is self-describing. When a button label changes from Save to Save & Close, you see it in the diff. When an id is missing, you fall back to compound role+name or to positional locators like above:name:OK. No object repository to chase.

The authoring loop includes an LLM

You type a prompt into Claude Code or Cursor. The agent calls get_window_tree to see the UI, drafts a YAML with execute_sequence, runs it once to validate, and commits the result. A non-trivial workflow is authored in one loop, not fifty designer clicks.

The runtime is a single Rust binary

terminator-mcp-agent is 32 tools, one process, and a stdio transport. No Power Automate machine, no on-prem data gateway, no Azure dependency. MIT license means a fork is a git clone away.

The execution context survives the session

Per-workflow state.json under %LOCALAPPDATA%/mediar/workflows/<folder>/state.json lets the next run resume exactly where the last one stopped. Move the file to a VM, keep the state, replay unattended.

What the MCP server actually ships

A quick map of what you get when you run npx -y terminator-mcp-agent@latest. Every card here is a concrete feature, not a marketing promise.

One selector grammar, everywhere

role:Button && name:Save works in the YAML, in the MCP tool arguments, in Claude Code prompts, and in the Rust SDK. The same string routes through the accessibility tree the same way every time.

32 MCP tools, one dispatch arm

open_application, click_element, type_into_element, press_key_global, wait_for_element, validate_element, navigate_browser, execute_browser_script, run_command, execute_sequence, and more, all routed from one match in crates/terminator-mcp-agent/src/server.rs.

Direct UIA, no abstraction

The Windows adapter binds to IUIAutomation through the uiautomation Rust crate. No PAD recorder in the middle, no object repository, no signed-custom-action SDK.

Headless replay on a VM

TERMINATOR_HEADLESS=true initializes a virtual display context. Windows UIA still reads the tree. The same YAML authored in Claude Code runs unattended without an interactive session.

Recorder that writes diffable JSON

terminator-workflow-recorder captures mouse, keyboard, clipboard, and UI automation events into a plain JSON stream you can replay or convert to YAML. No proprietary .pad archive.

AI recovery when the tree is wrong

fallback_id on a step jumps to a recovery path; a gemini_computer_use arm is available when the accessibility tree is missing or a pixel-only surface is in the way.

Install and first real task

The whole install is one command. After that, every step here is something Claude Code or Cursor does for you, not a designer click.

install

From zero to a committed YAML

Install the Terminator MCP server

claude mcp add terminator "npx -y terminator-mcp-agent@latest" -s user. User scope means every Claude Code session on the machine sees the 32 tools. Cursor and VS Code get the same binary via mcp.json.

Confirm the dispatch is live

claude mcp list shows terminator stdio 32 tools. The list is generated from the dispatch match in crates/terminator-mcp-agent/src/server.rs at build time, so if a handler exists the LLM sees it.

Ask for a real task, not a hello world

"Post invoice INV-4412.pdf to QuickBooks under Expense:Software and save." Claude Code calls get_window_tree, drafts an execute_sequence YAML, runs it once for validation, and commits the file to .mediar/workflows/.

Check the YAML into git

The workflow is plain text. git add workflows/post-invoice.yml and review it in a pull request. Your reviewer reads the steps and the selectors without opening a designer.

Replay unattended

scp workflows/post-invoice.yml ops@win-vm-01:C:/flows/, set TERMINATOR_HEADLESS=true, run terminator mcp run C:/flows/post-invoice.yml. The same YAML, the same selectors, no interactive session, no per-bot license.

A checklist for choosing a developer-grade desktop automator

If you are leaving Power Automate Desktop (or arriving at the problem from a pure developer angle), these are the properties to hold your next tool to.

Developer RPA shortlist

The workflow artifact is a text file you can diff in a pull request
Selectors are readable strings (role, name, id, classname), not opaque handles
An AI assistant can author the workflow by calling a documented MCP tool
The same file runs unattended without a per-bot license
The runtime binds directly to the OS accessibility API (UIA on Windows)
Failure handling is encoded inside the workflow (fallback_id, jumps, stop_on_error)
The project is MIT-licensed and forkable

Walk through your PAD flow with the Terminator team

Bring one flow you would like to express as YAML. We will draft the selectors, wire up the MCP server, and run it against your target app live on the call.

FAQ

Is Terminator a drop-in replacement for Microsoft Power Automate Desktop?

No, and it is not trying to be. PAD is a citizen-developer RPA tool inside the Power Platform with a designer, an object repository, a Copilot add-on, and per-bot licensing. Terminator is a developer framework: a Rust SDK, a TypeScript SDK (@mediar-ai/terminator), an MCP server (terminator-mcp-agent), and a workflow recorder. They overlap on the same underlying Windows API (UI Automation), but the authoring and deployment models are different. If you want drag-and-drop inside a Microsoft tenant, use PAD. If you want your workflow in git and your AI assistant writing it, use Terminator.

What does Terminator's YAML workflow actually look like compared to a PAD flow?

A Terminator workflow is a plain YAML file with four top-level blocks: variables, selectors, steps, and stop_on_error. Each step names a tool (open_application, click_element, type_into_element, wait_for_element, select_option, run_command, validate_element, and so on) and passes arguments. Selectors are strings: role:Button && name:Save, window:Calculator >> role:Button >> name:Seven, id:submit. A real example is at crates/terminator/examples/cron_example.yml in the repo. A PAD flow, by contrast, lives inside the designer's proprietary action DSL with numeric element IDs pointing into an object repository that you cannot review in a pull request.

How do I get Claude Code or Cursor to author the workflow for me?

Install the MCP server with claude mcp add terminator "npx -y terminator-mcp-agent@latest" -s user (Claude Code) or the equivalent mcp.json block for Cursor, VS Code, or Windsurf. The server exposes 32 tools. Ask for a task, the assistant calls get_window_tree to read the current UI, drafts an execute_sequence YAML, and validates it. execute_sequence wraps a whole workflow in a single MCP call so the context window does not explode on a 20-step task. Power Automate Desktop has a Copilot add-on inside the designer, but there is no MCP interface an external agent can call.

Can Terminator run the same workflow unattended on a Windows VM like PAD does?

Yes, without the Process plan license, the on-prem data gateway, or the Power Automate machine. Set TERMINATOR_HEADLESS=true on the VM and run terminator mcp run workflow.yml. The agent detects the missing display session and spins up a virtual display context that Windows UI Automation can still read against. The workflow runs the same way an attended session runs: selectors resolve through IUIAutomation, clicks dispatch through invoke patterns, and the per-workflow state.json records progress so a failed run can resume.

How is the selector syntax different from PAD's UI element capture?

PAD records UI elements into an opaque object repository and references them by numeric ID in the flow. You cannot read the selector in a pull request; you open the designer to inspect it. Terminator uses plain strings documented in docs/SELECTORS_CHEATSHEET.md: prefixes like role:, name:, id:, nativeid:, classname:, text:, pos:, visible:, positional filters rightof:, leftof:, above:, below:, near:, indexing with nth:0 and nth-1, parent navigation with .., compound with && and chaining with >>. Example: window:Calculator >> role:Button >> name:Seven is a complete, readable locator for the Seven button in the Windows Calculator.

What about licensing? PAD ships with Windows; is Terminator free?

Power Automate for desktop is free to install, but unattended execution, premium connectors, and Power Automate hosted machines require Microsoft 365 or Power Automate Premium licensing with per-user or per-bot costs. Terminator is MIT-licensed on GitHub (github.com/mediar-ai/terminator). You can fork it, embed the Rust crate (terminator-rs) or the TypeScript SDK (@mediar-ai/terminator) in your own application, and ship it without a license key. The npm-distributed MCP server is the same MIT code.

What is the actual runtime stack under Terminator on Windows?

The core is the terminator Rust crate. On Windows it binds to the uiautomation crate, which wraps Microsoft's IUIAutomation COM interface (the same API PAD eventually calls). Selector resolution and element enumeration happen through direct UIA calls. Browser automation uses a Chrome extension on a local WebSocket (ws://127.0.0.1:17373) that accepts {action: 'eval', code} messages. The MCP agent (terminator-mcp-agent) is a separate Rust binary that imports terminator and speaks MCP over stdio or HTTP. macOS support exists for the Rust SDK via AX APIs; the MCP agent is Windows-first.

How does error recovery work when a selector goes stale?

Three mechanisms. First, fallback_id on a step lets the sequence engine jump to a named troubleshooting step instead of halting; that is inside execute_sequence, not in the LLM's re-planning loop. Second, continue_on_error: true on a step lets the workflow proceed past a non-fatal failure. Third, when the accessibility tree is wrong or the surface is pixel-only, a gemini_computer_use fallback arm is available that takes a screenshot and asks Gemini for coordinates. PAD's recovery model is a designer-level Error Handling block plus manual flow revision; Terminator encodes recovery inside the YAML so Claude Code does not have to re-plan after every transient failure.

Can I record my screen and get a Terminator workflow out the other side?

Yes. terminator-workflow-recorder is a Rust crate that hooks the Windows input stack and UI Automation event stream, producing a JSON workflow with every mouse click, keyboard event, clipboard operation, and focus change, each annotated with the UI element it hit. You can replay the JSON directly or translate it to YAML for execute_sequence. Documentation is in crates/terminator-workflow-recorder/README.md. PAD has a desktop recorder too, but its output is the proprietary flow format, not a plain text file.

Where in the Terminator repo do I look to verify what this page claims?

docs/SELECTORS_CHEATSHEET.md for the complete selector grammar. crates/terminator-mcp-agent/src/server.rs for the 32-tool dispatch (the match block near line 9953). crates/terminator-mcp-agent/README.md for the install commands, the MCP config JSON, and the TERMINATOR_HEADLESS notes. crates/terminator/examples/cron_example.yml for a runnable YAML workflow. crates/terminator-workflow-recorder/README.md for the recorder output format. crates/terminator/Cargo.toml for the uiautomation dependency that proves the Windows stack is direct IUIAutomation.

Other guides comparing Terminator to the tools people land on first

Keep reading

Comparison

Claude computer use, grounded in the accessibility tree

Vision-based agents click pixels; Terminator clicks role:Button && name:Save. A comparison and how the two compose.

Read

Comparison

Playwright MCP server, for more than just the browser

Same MCP shape as playwright-mcp, scope that does not stop at the browser tab. Includes the Chrome extension bridge.

Read

Guide

Claude Code MCP server that treats context as a budget

execute_sequence collapses N desktop steps into one MCP call, state.json survives the session, TERMINATOR_HEADLESS=true replays on a VM.

Read

Microsoft Power Automation Desktop, but the workflow is plain YAML your AI assistant can write.

The one thing every Power Automate Desktop article skips

The workflow file, side by side

The selector grammar, in one screen

Your AI assistant, not a designer

One MCP server, every major AI assistant

Power Automate Desktop compared, field by field

The numbers that tell the story

Unattended execution, without a bot license

What changes when the workflow becomes a text file

The artifact is a text file

The selectors are strings you can read

The authoring loop includes an LLM

The runtime is a single Rust binary

The execution context survives the session

What the MCP server actually ships

One selector grammar, everywhere

32 MCP tools, one dispatch arm

Direct UIA, no abstraction

Headless replay on a VM

Recorder that writes diffable JSON

AI recovery when the tree is wrong

Install and first real task

From zero to a committed YAML

Install the Terminator MCP server

Confirm the dispatch is live

Ask for a real task, not a hello world

Check the YAML into git

Replay unattended

A checklist for choosing a developer-grade desktop automator

Walk through your PAD flow with the Terminator team

FAQ

Keep reading

Claude computer use, grounded in the accessibility tree

Playwright MCP server, for more than just the browser

Claude Code MCP server that treats context as a budget

Comments (••)

Comments ()