GuideMCP workflowsexecute_sequenceTyped branching

Claude desktop automation, compiled into one MCP call instead of one click at a time

Every guide on this topic tells the same story. Claude takes a screenshot, Claude decides where to click, Claude clicks, repeat. That is what Anthropic's computer tool does. It is also the most expensive way to drive a desktop from an LLM, because the model sits inside the inner loop of every action. Terminator ships a different shape: a single tool called execute_sequence that takes a whole typed workflow as a parameter. Claude writes the plan once. The Rust engine runs it locally through the accessibility tree with retries, conditional jumps, fallback branches, and a JavaScript output parser. Then Claude wakes back up with structured JSON. The model is at the bookends, not between every click.

Terminator, desktop automation framework

Published April 23, 202613 min read

4.9from Open-source, MIT

One MCP call per workflow, not per click

ExecuteSequenceArgs struct at utils.rs:1506

Typed variables, named selectors, conditional jumps, fallback_id

State persisted to .mediar/workflows/<id>/state.json

Install in Claude Code Inspect the 19 typed fields

Two shapes of Claude desktop automation.

One puts the model inside every click. The other compiles the whole plan once.

Per-click MCP: model inference on every turn

execute_sequence: one call, whole workflow

Typed variables, named selectors, jumps, fallbacks

Model at the bookends, engine in the middle

server.rs:7537, utils.rs:1506, 19 fields

0:00 / 0:05

Any MCP client works. Same server binary, same 32 tools.

Claude CodeCursorVS CodeWindsurfZedContinue.devClineGoose

The question most guides skip

Given an MCP server full of desktop tools, why does Claude need to emit a separate tool call for every click? The default answer is that MCP works turn by turn, so of course it does. The useful answer is that turn-by-turn is a choice of the tool set, not a law of the protocol. A tool can accept a whole plan as its argument, run that plan, and return the result. That is exactly what execute_sequence is.

The effect is not subtle. A 40-step workflow that would cost you 40 model inferences, 40 tool-result round-trips, and whatever screenshots the host decided to ship along the way collapses into two inferences: the one where Claude writes the plan, and the one where Claude reads the parsed output. Between them, the engine is pure Rust talking to Windows UI Automation or macOS Accessibility at CPU speed.

The rest of this page is a tour of the fields on that tool, the file in the repo that defines them, and the lifecycle of one call from the moment Claude emits it to the moment the parsed JSON comes back. All of it is verifiable. Line numbers are given.

Two tool-call shapes, drawn as pseudo-JSON

Same user intent ("open the file, enter the number, press equals, read the result"). Two very different things Claude has to emit.

Claude emits one tool_use per atomic action. After every call, the MCP host returns, Claude thinks again, Claude emits the next one. For N steps you pay N inferences plus whatever the host decides to attach to each turn (often a screenshot).

One inference per step
Retries are the model's problem
Branching lives in the model's head, not in code
Resume after a crash means replaying turn one

What Claude actually types, in both shapes

These are abbreviated but faithful. On the left: three successive tool calls Claude would emit in the per-click shape. On the right: a single tool call that carries the entire plan.

per-click.jsonc

execute_sequence.jsonc

The shape of a single compiled call

Every input flows into the same execute_sequence_impl entry point. The engine fans out to the OS accessibility tree, the embedded scripting engines, and back into the tool dispatcher for step tools. Only the parsed output escapes back up to the model.

Claude emits once. Engine runs the fan-out. Parsed JSON returns.

ExecuteSequenceArgs: the anchor fact

This is the struct that defines the tool. Open crates/terminator-mcp-agent/src/utils.rs at line 1506. Every field below corresponds to a schemars annotation that becomes part of the MCP tool schema Claude sees. There is no framework magic between your YAML and this type.

crates/terminator-mcp-agent/src/utils.rs

What each of those fields is for

Nine of the 19 fields carry most of the weight. The rest are observability and execution-mode switches. Scan this grid once and the YAML stops feeling mysterious.

variables

Typed schema for every input the workflow accepts. Each entry declares type (string, number, enum, array, object), label, default, regex, and options. The same schema powers form UIs in front of the workflow.

inputs

Per-run values that satisfy the variables schema. This is what changes between runs. Everything else stays the same.

selectors

Named shortcuts for UI elements. btn_save instead of role:Button && name:Save pasted in five places. DRY for accessibility selectors.

steps

The workflow itself. Each SequenceStep is a tool call with optional id, retries, r#if expression, jumps array, and fallback_id.

troubleshooting

A separate list of steps that only run when a normal step's fallback_id points at them. Keeps recovery paths out of the happy-path flow.

output_parser

JavaScript (or declarative DSL) that runs against the final UI tree and returns structured JSON back to Claude. The whole reason this scales.

start_from_step / end_at_step

Resume from a named id or stop after one. With state persistence in .mediar/workflows/<id>/state.json, you can replay a single step in isolation.

stop_on_error / continue

Switch between strict and best-effort execution. Some flows want to die on first failure; others want to finish and report what happened.

trace_id / execution_id

OpenTelemetry correlation ids that thread through executor and agent logs. Wire it to your observability stack once.

SequenceStep: where branching actually lives

Each entry in the steps array is a SequenceStep, and the shape of that type is what lifts execute_sequence from a macro recorder into a workflow language. An id turns a step into a variable. A retries count turns it into a bounded loop. An r#if expression turns it into a guarded branch. A jumps array turns it into a switch. A fallback_id turns it into a recovery entry point.

crates/terminator-mcp-agent/src/utils.rs

The numbers, checkable against the repo

Every one of these is a count or a line number you can reproduce. Clone the repo at mediar-ai/terminator and grep for yourself.

0Typed fields on ExecuteSequenceArgs

0Typed fields on SequenceStep

0Line of execute_sequence in server.rs

0Total MCP tools the agent exposes

Model inferences for an N-step workflow

utils.rs line of ExecuteSequenceArgs

server_sequence.rs line where state.json path is computed

Target speedup over pixel-loop agents

2 inferences / N steps

“The model writes the plan, the engine runs it. For an N-step workflow, Claude is invoked twice regardless of N: once to compile, once to read the parsed output.”

Terminator MCP agent, execute_sequence contract

Why start_from_step actually works

Resumption is only useful if the engine remembers what came before. Terminator writes the environment to disk after each step that mutates it, keyed on the workflow id or file URL. The next run reads that state before it starts. This is what makes it safe to kill a run mid-flow and pick up at a named id without replaying from the top.

crates/terminator-mcp-agent/src/server_sequence.rs

Per-click MCP agents vs a compiled workflow, feature by feature

The comparison that matters is not Terminator against other MCP servers. It is two designs of what Claude should hand over when it wants a computer to do something.

Feature	Per-click MCP tools	execute_sequence
Model inferences per workflow of N steps	N (one per tool call)	1 plus a final parser step (constant, not a function of N)
Who decides the order of steps	Claude, re-deciding each turn based on the latest tool result	The YAML. The model wrote the plan once. The engine executes.
Retries	Claude must notice the failure and emit a new tool call	Per-step retries: u32 field on SequenceStep. Engine loops internally.
Conditional branching	Relies on Claude reading the previous result and picking the next action	r#if per step, plus jumps array with first-match-wins expressions
Recovery from a bad UI state	Claude retries blindly until it hits a token budget	fallback_id routes to a named step in the troubleshooting list
Resume after a crash	Start from turn one. The model has no memory of where it was.	start_from_step + state.json on disk. Picks up from the last id that ran.
Structured output	Claude summarises the run in prose. Parsing is your problem.	output_parser with JavaScript code. Returns typed JSON back to Claude.
Observability	Whatever the MCP host happens to log	OpenTelemetry trace_id + execution_id fields on every call
When the model is actually needed	Every single step, including trivial ones	At the start (compile the plan) and the end (read the parsed result)

One execute_sequence call, traced end to end

Read along with the source files. This is what happens between Claude emitting the JSON and the MCP host returning a CallToolResult.

Claude emits a single tool_use for execute_sequence

The arguments object carries variables, inputs, selectors, steps, optional troubleshooting, and an output parser. Everything the workflow needs to run is in one payload.

MCP host frames it as JSON-RPC and ships to the agent

Claude Code, Cursor, Windsurf, whichever MCP client you are using pipes the request over stdio to the terminator-mcp-agent child process.

dispatch_tool routes into execute_sequence_impl

server.rs line 10234 handles nested calls via Box::pin. Top-level calls hit server.rs line 7537. Both paths end in execute_sequence_impl at server_sequence.rs line 344.

Inputs are validated against the variables schema

Types, regex patterns, enum options, required flags. A malformed call fails here with a typed error, not half way through clicking.

State is loaded from .mediar/workflows/<id>/state.json

If start_from_step is set, the engine restores env vars from the last run so later steps see {id}_result and {id}_status from before.

Each step runs through the same dispatch_tool

No model inference. Templated args get ${{...}} substitution, the selector resolves against the OS accessibility tree, the action fires, the step id captures the result, jumps are evaluated, fallback_id is honoured on ultimate failure.

output_parser runs against the final tree

JavaScript or declarative DSL. Walks the tree, returns a small JSON object (total, list of rows, whatever the workflow was actually for).

CallToolResult returns up the stdio pipe

Claude wakes up once, reads the parsed object, and decides the next user-facing action. For a workflow of 40 steps that is two model inferences total.

Install in Claude Code, one command

The MCP agent ships as a single npm package. Claude Code picks it up at the user scope and exposes all 32 tools including execute_sequence.

terminal

Compile your next desktop workflow into one MCP call

15 minutes with the Terminator team. Bring a real flow you want Claude to run, leave with a typed execute_sequence draft.

Questions readers actually ask

Is Claude desktop automation the same thing as Claude computer use?

They overlap but they are not the same product. Anthropic's computer use is a tool type (computer_20251022) exposed in the API where Claude sees a screenshot and returns pixel coordinates. Claude desktop automation as a goal, getting Claude to reliably drive apps on your OS, can be built on top of that, or on top of the accessibility-tree path that Terminator exposes over MCP. Most articles pick the first interpretation because it is the newest. This page is about the second, which is more deterministic and much cheaper for long workflows.

What does execute_sequence actually accept as input?

A single JSON object whose schema lives in crates/terminator-mcp-agent/src/utils.rs at line 1506 as the ExecuteSequenceArgs struct. It has 19 typed fields: steps, troubleshooting, variables, inputs, selectors, stop_on_error, include_detailed_results, output_parser, output, continue, verbosity, start_from_step, end_at_step, follow_fallback, execute_jumps_at_end, scripts_base_path, workflow_id, skip_preflight_check, trace_id, execution_id, plus a flattened WindowManagementOptions block. You can also pass a url pointing at a local file or HTTP endpoint that contains the same shape in YAML, so Claude does not have to reprint a huge workflow every time.

How is each step more than just a tool call?

SequenceStep is defined at utils.rs line 1453. On top of tool_name and arguments, every step can carry an id (which exposes {id}_result and {id}_status to later steps), a retries count, an r#if expression evaluated before the step runs, a jumps array evaluated after success, a fallback_id that routes into the troubleshooting list on ultimate failure, a group_name to bundle nested steps, continue_on_error and skippable flags, a delay_ms or human-readable delay, and an expected_ui_changes hint for drift detection during replay.

Where does the model actually run in this design?

At the bookends. Claude (or Cursor, or Windsurf, it is just an MCP client) compiles your intent into the execute_sequence payload once. The Rust engine runs the steps locally, resolves each selector against the Windows UI Automation tree or the macOS Accessibility tree, applies retries and jumps without another inference, and passes the final UI tree into the output_parser. The parser returns structured JSON to the MCP host, which is where the model wakes up again and decides what to do with the result. For an N-step workflow, the model is invoked twice, not N times.

How do conditional jumps work?

Each step can carry a jumps field that is an array of JumpCondition entries. After the step completes successfully, the engine walks the array in order and evaluates the if expression against the environment. The first match wins; the engine jumps to the step with that to_id. Expressions can reference any {id}_status or {id}_result variable set earlier in the run. See crates/terminator-mcp-agent/tests/workflows/test_jump_if.yml for the canonical test of this behavior, including first-match-wins, complex &&/|| expressions, and jump conditions that deliberately do not trigger.

Can a workflow resume after a crash or a partial run?

Yes. When the call carries a file:// url or an explicit workflow_id, the engine writes its environment to disk after each step that has an id or that modifies env via set_env. On macOS the state lives at ~/Library/Application Support/mediar/workflows/<folder>/state.json; on Windows at %LOCALAPPDATA%\mediar\workflows\<id>\state.json; on Linux at ~/.local/share/mediar/workflows/<folder>/state.json. Pass start_from_step to resume from a named id. That is what server_sequence.rs lines 189 to 207 implement.

Does the output_parser have to be JavaScript?

No. There is a declarative DSL form (output_parser field) that is JSON and handles most extraction without any code. The simplified output field takes JavaScript as javascript_code, which runs inside the agent process with access to the final UI tree. Use the DSL for simple field extraction and the JS form when you need to walk the tree or format numbers. Either way, what Claude sees back is parsed JSON rather than a raw tree dump.

How does this coexist with Claude's native computer use tool?

They are complementary. Anthropic's computer tool is useful for UIs where the accessibility tree is empty or lies (games, canvas-heavy apps, some Electron shells). Terminator exposes a capture_screenshot tool and a gemini_computer_use fallback, so a workflow can mix selector-based steps with an occasional vision-based step. The important thing is that the selector path, and execute_sequence in particular, is the default, not the exception. Pixel vision is reserved for the cases where the OS refuses to tell you where the button is.

How do I install the MCP server so Claude Code can use it?

One command at the user scope: claude mcp add terminator "npx -y terminator-mcp-agent@latest" -s user. Terminator registers as a stdio MCP server under Claude Code's supervision and exposes all 32 tools including execute_sequence. The same binary runs in Cursor, VS Code, Windsurf, Zed, and anything else that speaks the Model Context Protocol. Setup details and per-client instructions live in crates/terminator-mcp-agent/README.md in the repo.

Keep reading

Guide

Claude computer use, explained at the tool-schema level

What Claude actually emits under Anthropic's computer_20251022 tool, and the selector-based alternative Terminator exposes over MCP. dispatch_tool at server.rs:9953.

Read

Guide

Best MCP server for desktop agents

How MCP servers are evaluated when the agent is supposed to drive a desktop, not just answer questions. Terminator as a working example.

Read

Survey

Open source computer-use agents, April 2026

A field survey of the frameworks that let an LLM click around an OS. Which ones compile workflows. Which ones loop on screenshots.

Read