Guide

Terminator computer: the real tool behind the name

Type “terminator computer” into a search box and you get Skynet: the fictional defense computer that goes self-aware in the films. Fair enough. But there is also a real piece of software named Terminator, and it does something the movies only imagined: it lets an AI control your actual computer.

Matthew Diakonov, Written with AI

Published June 22, 20268 min read

Direct answer · verified Jun 22, 2026

Terminator is an open-source framework from Mediar AI that gives AI assistants (Claude, Cursor, VS Code, and others) computer-use control over every app on your desktop. It works through native accessibility APIs instead of screenshots, so it is fast and deterministic. It is not the movie’s Skynet; it is a developer tool you install in one line:

claude mcp add terminator "npx -y terminator-mcp-agent@latest"

Source and full docs: github.com/mediar-ai/terminator (MIT licensed).

Two things named Terminator, one of them is software you can run

The fictional one

Skynet / the Terminator computer

The artificial intelligence from the films, built by the fictional Cyberdyne Systems, that becomes self-aware and turns on humanity. A story. Nothing to install.

The real one

Terminator by Mediar AI

An open-source desktop automation framework. It is shaped like Playwright, the browser-testing tool, but it targets your whole operating system rather than a single web page. Real code, real downloads, real AI control of real apps.

The rest of this page is about the second one, because that is the one you can actually use. If you landed here looking for the movie lore, the short version is above and you can stop reading. If you are a developer who keeps seeing the name attached to AI agents and wants to know what it does, keep going.

What “computer use” means here

Computer use is the idea of letting an AI model operate a computer the way a person does: not by calling an API, but by clicking buttons, typing into fields, reading what is on screen, and stringing those actions together to finish a task. It is how an assistant can check logs in a dashboard that has no API, fill in a legacy desktop form, or test your own app by actually using it.

The hard part is the bridge between “the model decided to click Save” and “the Save button got clicked.” There are two ways to build that bridge. You can show the model a screenshot and have it point at a pixel. Or you can give the model the structured tree of every control in the window and have it pick one by name. Terminator is built around the second, and falls back to the first only when it has to. That single decision is what makes it different from most things sold as “computer use.”

How the computer-use loop actually runs

You do not have to take the architecture on faith; it is in the source. The repository ships a crate at crates/terminator-computer-use, whose Cargo.toml describes it as “Gemini Computer Use - AI-powered autonomous desktop automation.” Here is the cycle it runs, step by step, with the exact functions that do the work.

Capture the screen and the tree

The agent takes a screenshot of the target window and, on the default path, the live accessibility tree of every element in it. The tree is the structured part: each control already carries its role, name, and bounds from the OS.

Ask the model what to do next

The current state goes to the model. In the bundled Gemini loop, the model replies with a single action: click_at, type_text_at, or scroll_document are the literal action names documented at crates/terminator-computer-use/src/lib.rs line 20.

Translate the action into a real click

Vision models emit a point in a normalized 0-999 grid, not screen pixels. convert_normalized_to_screen at lib.rs:336 walks that point back through the resize scale, the DPI scale, and the window offset to land on the exact pixel. On the accessibility path there is no math: the element already knows where it is.

Execute, then report a status

The action runs against the OS. Each step returns one of four statuses, spelled out at lib.rs:86: success, failed, needs_confirmation, or max_steps_reached. The needs_confirmation branch is the safety gate, the agent pauses instead of clicking something destructive on its own.

Feed the result back and repeat

The new screenshot and the outcome of the last action go back to the model for the next step. The loop continues until the model declares the task complete or hits its step ceiling. That is the whole computer-use cycle.

That coordinate translation in step three is worth dwelling on, because it is the cost the vision path pays on every click. The model speaks in a normalized 0-999 grid, and convert_normalized_to_screen has to undo a resize scale, a DPI scale, and a window offset to find the real pixel. On the accessibility path that whole function is unnecessary, because the element told the OS where it is from the start. You can read both yourself in the crate source.

Accessibility tree vs. looking at the screen

The headline products you have heard of for AI computer use mostly read screenshots and infer where to click. Terminator can do that too, but it prefers to ask the operating system what is on screen. The difference shows up everywhere that matters in production.

Feature	Screenshot-only computer use	Terminator (accessibility-first)
How it finds a button to click	Reads a screenshot, infers pixel coordinates of the button	Looks up the button in the accessibility tree by role and name
What happens when the window moves or DPI changes	Coordinates drift, the click misses	The element reference still resolves, the OS tracks its position
Speed per action	Gated by a vision model round-trip every single step	Structural lookups run at CPU speed, the model is called only on recovery
Does it seize your mouse and keyboard	Usually drives the real cursor, you cannot touch the machine	Runs in the background through the accessibility interface, you keep working
Where vision still helps	It is the only input, so it is used everywhere	Reserved for canvases and custom-drawn UI the tree cannot describe

Vision is not the enemy here. Terminator keeps it on hand for canvases, custom-drawn controls, and anything the accessibility tree cannot describe. The point is that it is the exception, not the default.

If you want the long version of this argument, with the latency and internationalization tradeoffs spelled out, see why accessibility APIs beat OCR and pixel matching, and the deeper dive on the seven grounding modes a real agent falls through.

What you can hand the computer to do

Once Terminator is wired into an assistant that speaks MCP, the assistant gains a set of desktop powers it did not have when it could only write code. These are the things it can now do on the machine itself, with you free to keep working alongside it.

With the MCP server connected, the assistant can

Open and switch between any application, not just the browser
Click buttons and type into fields by name, across native and legacy apps
Read the structure of what is on screen instead of guessing from pixels
Record a human workflow once and replay it deterministically
Reuse your existing browser session, so no relogin and your cookies stay
Run in the background without grabbing your mouse or keyboard

The maintainers list concrete examples in the repo: spin up a new instance on a cloud provider and connect to it from the CLI, dig through logs in a hosting dashboard to find the most common errors, or test new features of your own app based on recent commits. None of those require an API for the target tool; the assistant just uses the app.

Getting started in one line

The fastest path is the MCP server. If you use Claude Code, add it with a single command and the assistant can drive your desktop from its next message:

claude mcp add terminator "npx -y terminator-mcp-agent@latest"

For Cursor, VS Code, or Windsurf, drop the same npx -y terminator-mcp-agent@latest command into your MCP config file under a server entry. If you would rather call it from your own code, install the Rust crate terminator-rs or the Python package terminator-py. Windows is the platform with full support today.

Building an agent that needs to drive real desktop apps?

Talk through your automation with the team behind Terminator and find out whether the accessibility-first approach fits your use case.

Frequently asked questions

Is 'Terminator computer' the same thing as Skynet from the films?

No. Skynet is the fictional artificial intelligence from the Terminator movies, a defense computer that becomes self-aware. This page is about a different thing that shares the name: Terminator, a real open-source software framework from Mediar AI that lets AI assistants control your actual computer. It is a developer tool you install, not a movie plot. The name is a nod to the franchise; the product is a desktop automation framework.

What does the Terminator tool actually do?

It gives an AI assistant the ability to drive every application on your desktop the way a person would: open apps, click buttons, type into fields, read what is on screen, and chain those actions into a task. It does this through native accessibility APIs (Windows UI Automation, macOS Accessibility), the same interfaces screen readers use, so it understands the structure of the UI rather than guessing from pixels. Think of it as Playwright, the browser automation tool, but pointed at your whole operating system instead of just a web page.

How do I install it?

For an AI assistant that speaks MCP, it is one line. In Claude Code: claude mcp add terminator "npx -y terminator-mcp-agent@latest". For Cursor, VS Code, or Windsurf you add the same npx command to your MCP config file. If you want to call it from code directly, there is a Rust crate (terminator-rs) and Python bindings (terminator-py). The full setup is in the repo at github.com/mediar-ai/terminator.

Does it take over my mouse and keyboard while it runs?

No, and this is one of its deliberate design choices. Most screenshot-driven computer-use agents move your real cursor, so you have to sit on your hands while they work. Terminator drives applications through the accessibility interface in the background, which means it can click and type inside apps without hijacking your physical mouse or keyboard. You can keep using the machine for something else while it works.

What is the terminator-computer-use crate I see in the repo?

It is a self-contained autonomous agent that uses Google's Gemini Computer Use model to drive the desktop end to end. Its Cargo.toml describes it as 'Gemini Computer Use - AI-powered autonomous desktop automation'. This is the pure-vision path: the model looks at screenshots and emits actions like click_at and type_text_at. It is the fallback for when you want a hands-off agent. The recommended path for production work is the accessibility tree, with vision used only where the tree falls short. Both share the same underlying click implementation.

Why use accessibility APIs instead of just letting the model look at the screen?

Three reasons: latency, stability, and reliability. A vision-only loop calls a model on every single step, which is slow and expensive. Accessibility lookups run at CPU speed and only invoke the model when something needs recovery. Vision-derived coordinates also break when a window moves or the display scaling changes, because nothing on the screen is anchored to a raw pixel; an accessibility element reference survives those changes because the OS tracks where the element is. The result is a deterministic automation that the project reports running with a high success rate, with AI reserved for the moments it is genuinely needed.

Which operating systems does it support?

Windows is the primary, fully supported platform, with element location, clicking and typing, application and window management, browser automation, workflow recording, and screen capture all stable. macOS support exists at the core Rust level. Linux uses the AT-SPI2 accessibility layer. The Node.js, Python, and MCP packages currently ship Windows binaries, so if you are wiring an AI assistant to your desktop today, Windows is the path with the fewest sharp edges.

Is it free and open source?

Yes. Terminator is MIT licensed, so you can read the source, fork it, and ship it inside your own product with no lock-in. The code lives at github.com/mediar-ai/terminator. There is also a hosted product (the Mediar workflow builder) for teams who want recording, mapping, and managed execution without running their own infrastructure, but the framework itself is open.

Who is this for?

Developers building desktop automation, AI agents with computer-use capabilities, or MCP tools that need to drive real applications beyond the browser. It is a strong fit if you have hit reliability limits with PyAutoGUI, AutoHotkey, raw UI Automation, or screenshot-based approaches. It is not the right tool if you only need web-browser automation (Playwright already does that well) or if you want a no-code consumer app.

The name borrows from the movies. The software is the part you can run today: an open-source framework that finally gives an AI the hands to operate your whole computer, built on the boring, dependable plumbing of accessibility APIs rather than the sci-fi of a self-aware mainframe. Read the source at github.com/mediar-ai/terminator or join the build conversation on Discord.

Two things named Terminator, one of them is software you can run

What “computer use” means here

How the computer-use loop actually runs

Capture the screen and the tree

Ask the model what to do next

Translate the action into a real click

Execute, then report a status

Feed the result back and repeat

Accessibility tree vs. looking at the screen

What you can hand the computer to do

Getting started in one line

Building an agent that needs to drive real desktop apps?

Frequently asked questions

Frequently asked questions

Comments (••)

Comments ()