M
Matthew Diakonov
12 min read

Free desktop automation tools, audited at the flag level

Every list of free desktop automation tools is a checkbox exercise: MIT column, Windows column, download link. What they skip is where the cost cliff actually lives. UiPath Community caps at two users. Power Automate Desktop charges for every unattended run. Ui.Vision is local-free but cloud-paid. Terminator is the rare case where the cost surface is small enough to read in a single file, and three environment variables move the remaining remote calls onto infrastructure you control.

5.0from MIT core, opt-in perception tiers, telemetry off with one env var
Four include_* flags expose the cost surface of each perception tier
Three env vars move every remote call off Mediar infrastructure
No license server, no seat cap, no cloud executor

The cliff every free-tools roundup leaves off the table

A roundup page will tell you AutoHotkey is free, UiPath is free, Power Automate Desktop is free. That is half a sentence. Each one is free until you hit the part of the product that makes it interesting to use. UiPath Community is free at two users; the third user breaks the license. Power Automate Desktop is free for attended flows; the moment you schedule one, a Premium connector kicks in at 15 USD per user per month. Ui.Vision is free on your laptop; the cloud runs are billed per minute. AutomationAnywhere Community expires on a clock. TestComplete shows up on free-tools lists because of its trial.

What 'free' means on a roundup page vs. at the flag level

A table with a single checkmark in a Free column. No mention of seat caps, unattended-run paywalls, vendor-hosted vision backends, or telemetry that cannot be turned off.

  • MIT column: yes or no
  • Windows column: yes or no
  • Pricing: 'contact sales' or '$0'
  • No per-feature cost analysis

The anchor: three env vars, four opt-in flags, one MCP tool

Terminator's cost surface lives in four files. The four include_* booleans on get_window_tree are defined in crates/terminator-mcp-agent/src/utils.rs. The remote call for OmniParser lives in crates/terminator-mcp-agent/src/omniparser.rs. The telemetry opt-out is in crates/terminator-mcp-agent/src/posthog.rs. The error tracking opt-out is in crates/terminator-mcp-agent/src/sentry.rs.

Every default that could send a packet outside your network reads an environment variable first. Set POSTHOG_DISABLED=true, SENTRY_DISABLED=true, and point OMNIPARSER_BACKEND_URL at a server you run, and the MCP agent emits zero outbound requests unless your workflow makes them. That is the anchor claim of this guide.

The five cost tiers of get_window_tree

One MCP tool returns the tree the agent reads. The base tree is UIA and always on. Four optional perception layers stack on top, each behind a boolean that defaults to false. The tiers are sorted roughly by cost.

1

Tier 0: UIA tree

On by default. Queries Windows UI Automation via one batched IUIAutomationCacheRequest. Local, synchronous, zero network, zero license. This is the tree the agent reads on every iteration.

2

Tier 1: include_ocr

Opt-in flag. Runs Tesseract on a window screenshot via the uni-ocr crate. Local and free; the only cost is the CPU spike while the recognizer runs. Nothing leaves the machine.

3

Tier 2: include_browser_dom

Opt-in flag. Talks to the Terminator Chrome extension over a local bridge. Returns tag names, identifiers, and viewport-aligned bounds for elements the DOM sees that UIA does not. No external network.

4

Tier 3: include_omniparser

Opt-in flag. Posts the window screenshot to OMNIPARSER_BACKEND_URL. Default URL is Mediar's hosted endpoint; swap it for one you run and no pixels leave your network. The model is OmniParser V2; you supply your own host or we supply ours.

5

Tier 4: include_gemini_vision

Opt-in flag. Calls Google's Gemini API with your own GEMINI_API_KEY. This is the only tier that can actually bill your credit card, and it is off by default. The tool description explicitly asks the agent to leave it off unless the others are not enough.

utils.rs (lines 487-509)

The one call that leaves your network by default, and the env var that stops it

Of the five tiers, only include_omniparser has a default that talks to a third party, and that third party is Mediar. The Rust function that makes the call reads OMNIPARSER_BACKEND_URL first and only falls back to the Mediar endpoint if the env var is unset. The backend contract is public: POST a base64 PNG plus an imgsz integer, receive normalized 0 to 1 bounding boxes. Build your own server that matches that contract, or skip the tier entirely; nothing else depends on it.

omniparser.rs (parse_image_with_backend)

Telemetry works the same way. The PostHog module checks an env var before every capture call, and the distinct ID is a hash of the hostname rather than a user identifier. If the var is set, the capture function exits without building an HTTP client.

posthog.rs (capture path)

Three env vars, three outbound calls turned off

If you want an MCP agent that runs on your laptop, speaks to your AI assistant locally, and sends nothing anywhere else unless your own workflow does, this is the full list.

Where the three env vars intercept each remote call

PostHog capture
Sentry errors
Omniparser call
Three env vars
POSTHOG_DISABLED
SENTRY_DISABLED
OMNIPARSER_BACKEND_URL
1

POSTHOG_DISABLED=true

Turns off the PostHog capture path that fires on startup and on every tool execution. The is_disabled() check returns early before any HTTP client is constructed. TERMINATOR_ANALYTICS_DISABLED=true is accepted as an alias.

2

SENTRY_DISABLED=true

Stops the default Sentry DSN from being initialized. The init_sentry function checks this env var before calling sentry::init, so no transport is created and no spans leave the process.

3

OMNIPARSER_BACKEND_URL=<your URL>

Overrides the hardcoded fallback of https://app.mediar.ai/api/omniparser/parse. The backend contract is a JSON POST with fields image (base64 PNG) and imgsz (640-1920); the response is an elements array with normalized 0-1 bounding boxes. You can host the OmniParser V2 model yourself and hand its URL to this variable.

0 packets

Set POSTHOG_DISABLED and SENTRY_DISABLED and point OMNIPARSER_BACKEND_URL at localhost and you can watch the Wireshark output go silent.

Implementation note, posthog.rs / sentry.rs / omniparser.rs

The MCP config that locks this in

One block in your assistant's MCP config pins all three variables. The agent restarts, the flags are in effect, and the agent's behavior is identical to the default minus the outbound traffic.

claude_desktop_config.json
What a zero-egress run looks like

The usual shortlist, with the cliff printed next to it

These are the twelve products most guides on this topic repeat. What every list leaves out is the specific condition under which the free label stops applying. The chip next to each name is the first line in that story.

AutoHotkey|Truly free, scripting onlySikuliX|Truly free, image matching onlyRobot Framework|Truly free, no agent interfacePywinauto|Truly free, Python onlyWinium|Truly free, Selenium protocolWinAppDriver|Microsoft, Appium protocolUi.Vision RPA|Free local; cloud is paidPower Automate Desktop|Unattended runs are PremiumUiPath Community|Two-user cap, non-productionAutomationAnywhere Community|No API schedulingTestComplete|Trial only, not freeAskUI|Paid tiers for API use

The comparison that belongs on every one of those lists

A free desktop automation tool worth picking is one where the features that matter to your agent do not disappear the moment you turn them on. The question is not "free or paid"; it is "what is the unit that bills me once I scale this."

FeatureTypical free RPA or testing toolTerminator
LicenseMixed: community, proprietary, seat-cappedMIT, unrestricted, no per-seat rules
Unattended runs on the free tierUsually blocked (PAD Premium, UiPath Production)Same code path as attended; no distinction
Cloud backend requiredOften for vision or schedulingOnly Omniparser; self-host with one env var
Telemetry opt-outEnterprise feature or buried settingsSingle env var (POSTHOG_DISABLED=true)
Error tracking opt-outNot documentedSingle env var (SENTRY_DISABLED=true)
Cost surface visible to the agentOpaque, spread across docs and billing pagesFour boolean include_* flags on one MCP tool
Per-user cost at scaleGrows with seats (UiPath, AutomationAnywhere)Flat; agent drives one npx install
What bills your credit cardSeats, unattended runs, cloud minutesGemini API (only if include_gemini_vision is true)

The counts that actually matter

A handful of small numbers define the entire cost surface. 0 opt-in flags, 0 env vars, and 0 MCP tools the agent can call once the tree is in hand. Everything else is the same open-source Rust core that ships on crates.io as terminator-rs.

Why this matters for an AI coding assistant

The reason free desktop tools tend to get expensive the moment an assistant drives them is that those tools were designed for a human operator or a test runner in CI. An assistant calling the same tool in a loop triggers the commercial surface: scheduled triggers, unattended bot licenses, cloud minutes, seat counts. Terminator flips this. The assistant is the operator, the MCP agent is a single npx install, and every tier that could bill you is a boolean the model toggles.

Install once with claude mcp add terminator "npx -y terminator-mcp-agent@latest", set your three env vars, and the cost of adding desktop automation to Claude Code, Cursor, Codex, or Windsurf is the cost you already pay for the assistant plus nothing. That is the whole pitch.

Audit the cost surface on your own setup

Bring the tool you were planning to use. We will walk through each flag and each env var, show you which one turns on the first outbound packet, and leave you with a config that runs silent.

Frequently asked

Frequently asked questions

Which desktop automation tools are actually free end to end, and which have a cost cliff later?

AutoHotkey, SikuliX, Robot Framework, Pywinauto, Winium, and WinAppDriver are genuinely free and open source with no runtime vendor dependency. Ui.Vision is free for local use but its cloud execution is paid. Power Automate Desktop is bundled with Windows 11 but unattended runs require a per-bot Premium license. UiPath Community is free with a two-user limit and a community-only license that blocks production use. AutomationAnywhere Community Edition expires on a rolling basis and does not include API scheduling. Blue Prism and TestComplete are not free at all; they show up on free-tools roundups because they offer trials. Terminator is MIT licensed and ships the full feature set on npm, PyPI, and crates.io. The only paid surface it touches is the Gemini vision API, and only if you opt in with a flag. Every other remote dependency has an environment variable that points it at your own infrastructure or turns it off.

What exactly are the three environment variables that move Terminator off vendor infrastructure?

OMNIPARSER_BACKEND_URL swaps the default https://app.mediar.ai/api/omniparser/parse endpoint with any URL you control; the expected contract is a POST that takes a base64 image plus an imgsz integer and returns normalized 0 to 1 bounding boxes with element_type and content fields. POSTHOG_DISABLED=true (or TERMINATOR_ANALYTICS_DISABLED=true) turns off the PostHog EU capture that the MCP agent fires on startup and on each tool execution. SENTRY_DISABLED=true turns off the default Sentry DSN for error tracking. Each one is a single line in your MCP server env block. After those three are set, the MCP agent has no outbound network calls unless your own workflow makes them.

What are the include_* flags on get_window_tree and what does each one actually cost?

Four of them, all defaulting to false. include_ocr runs Tesseract locally on the captured window; zero dollars, but it spins up CPU and adds tens to hundreds of milliseconds depending on the window size. include_omniparser posts a screenshot to the OmniParser backend at https://app.mediar.ai/api/omniparser/parse by default; zero dollars on Mediar's tier, but you can self-host by setting OMNIPARSER_BACKEND_URL. include_gemini_vision calls Google's Gemini API with your own key; this is the only tier that can actually bill your credit card. include_browser_dom talks to the Terminator Chrome extension over a local socket; free, but requires the extension to be installed. The tool description in server.rs explicitly tells the agent to turn these on only when the default UIA tree is missing what it needs.

Does Terminator require me to agree to telemetry to use it?

No. The PostHog module in crates/terminator-mcp-agent/src/posthog.rs checks POSTHOG_DISABLED and TERMINATOR_ANALYTICS_DISABLED before every capture. The Sentry module in crates/terminator-mcp-agent/src/sentry.rs checks SENTRY_DISABLED before initializing the client. If either variable is set to true the corresponding code path exits early. The distinct ID for PostHog is a hash of the hostname, not a user identifier, and the error classifier only sends error categories like 'element_not_found' or 'timeout' rather than raw messages. If you want zero outbound packets from the agent itself, set both env vars in your MCP config and the agent stays silent.

How does this compare to Power Automate Desktop's free tier?

Power Automate Desktop is free for attended, interactive use on Windows 10 and 11. Every trigger, schedule, and unattended run is a Premium license at 15 USD per user per month minimum, and cloud flows are gated behind a separate Microsoft 365 plan. If you have an AI coding assistant that needs to kick off an automation on a schedule or in the background, the free tier does not cover that. Terminator's MCP agent is a single npx command; the assistant drives it directly, there is no scheduler service, no bot registration, no premium connector layer. If you want to host it on a VM, that is your cloud bill, not a license.

Can an AI coding assistant like Claude Code, Cursor, or Codex drive these free tools without extra cost?

SikuliX, Robot Framework, and AutoHotkey can all be shelled out from an AI assistant, but they have no structured interface; the assistant has to write and debug shell calls against a CLI that was not designed for an agent to read. WinAppDriver exposes an Appium protocol over HTTP but does not return a unified tree format. Ui.Vision RPA and AskUI have cloud or enterprise tiers that bill per API call. Terminator was designed for this use case: terminator-mcp-agent@latest is a one-line npx install, registers 35 MCP tools, and returns a YAML tree the model can read directly. The cost of the assistant itself is what you already pay for Claude Code or Cursor; the agent adds zero on top of that unless you opt into the Gemini vision tier.

Is there any feature gated behind a paid Terminator tier?

No. The Rust core, the Node and Python bindings, the CLI, the MCP agent, the workflow recorder, and the Chrome extension are all MIT and publicly versioned on crates.io, npm, and PyPI. Mediar sells managed hosting and a workflow builder product (mediar.ai) that uses Terminator underneath, but the library itself is not gated. You can cargo add terminator-rs, npm install @mediar-ai/terminator, or npx -y terminator-mcp-agent@latest today and get the same feature set the managed product uses.

Why does 'free' usually break once you add an AI coding assistant into the loop?

Because free desktop tools were designed for a human operator or a test runner. An assistant calling an API in a loop hits the parts those products did not build out as free: schedule triggers, unattended runs, multi-machine licenses, telemetry opt-outs, or cloud executors for the image-matching step. Terminator inverts this. The assistant is the operator. Every feature that costs money in a traditional RPA is either local (UIA, OCR), optional and self-hostable (Omniparser), or bring-your-own-key (Gemini). The one-user, one-license model does not apply because there is no license server to talk to.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.