Windows automation scripting / inverted

You do not write the script. The runtime writes it while the agent works.

Every other guide on this topic hands you a PowerShell or AutoHotkey template and hopes the UI stays still. Terminator flips the scripting loop. An AI coding agent drives the desktop through our MCP server, and execution_logger.rs transcribes every tool call into a paired .json log and a replayable .ts snippet, with before and after PNGs, saved under %LOCALAPPDATA%\mediar\executions\ for 7 days.

Matthew Diakonov, Written with AI

Published April 24, 202613 min read

4.9from Referenced against crates/terminator-mcp-agent/src/execution_logger.rs

24 per-tool snippet generators

%LOCALAPPDATA%\mediar\executions\ with 7-day retention

verify_element_exists + retries wrap the regenerated script automatically

Windows automation scripting, inverted

The AI drives. The runtime writes the script.

You ask Claude Code to finish a job on the desktop

The agent calls click_element, type_into_element, run_command

execution_logger.rs transcribes every call to TypeScript

Paired .json + .ts + before/after PNG land on disk

Seven days later it prunes itself

0:00 / 0:05

The shape of the idea

In the conventional Windows scripting model, a human opens an editor, reads a reference page, and types a macro. The macro encodes keystrokes and coordinates. The macro breaks the moment the UI redesigns.

Terminator inverts both halves. The macro is not typed by hand. It is emitted by the runtime on your behalf, one line at a time, while an AI coding agent drives a real Windows UI Automation tree. The agent does not need to know how to write TypeScript. The runtime already knows.

The agent's job is to finish the task. The runtime's job is to record what the agent did in a replayable shape. You check the resulting .ts file into version control and treat it as your automation. It already uses role-plus-name selectors, already wraps itself in a retry loop when appropriate, and already carries a verify_element_exists postcondition.

The data flow

Watch one run land on disk

Install the MCP agent once, ask your agent to do a job, then walk over to the executions directory. You will find a paired file set for every tool call the agent issued.

claude-code session + %LOCALAPPDATA%\mediar\executions\

One file set per tool call

Three artifacts per action: a machine-readable request log, a regenerated replay snippet, and (for any tool that returns a screenshot field) a pair of PNGs that show the UI before and after the action fired.

executions directory layout

The moment the replay snippet appears

Here is what the runtime wrote back when the agent clicked the Save button in Word and asked for a postcondition check. The verify_element_exists block under the click comes from generate_verification_code at line 851 of execution_logger.rs.

2026-04-18_14-22-08-104_standalone_full_click_element.ts

And here is the snippet for the type_into_element call that came next. The original tool call set retries: 3, so line 751 of the generator wrapped the body in a retry loop. You did not write that loop. You also did not need to remember to write it.

2026-04-18_14-22-08-540_standalone_full_type_into_element.ts

0per-tool snippet generators

0 daysretention by default

0 filesartifacts per tool call (.json, .ts, .png)

0msretry delay in the regenerated retry loop

Line 684

“A single match block inside generate_typescript_snippet routes every MCP tool name to its own TypeScript generator. Anything without a generator falls through to a commented-out JSON block so the file still parses.”

crates/terminator-mcp-agent/src/execution_logger.rs

What the 24 generators cover

The match block routes every tool name to a purpose-built emitter. Every entry is a Rust function that reads the original tool arguments, wraps them in SDK-shaped TypeScript, and returns a string. The result is that the replay file reads like code a human would have written, not like a JSON dump.

click_element

generate_click_snippet (line 1042). Branches on coordinate mode, index mode (from get_window_tree), or selector mode. Emits desktop.click(x,y) or desktop.locator(...).first(5000).click().

type_into_element

generate_type_snippet (line 1187). Reads text_to_type, clear_before_typing, timeout_ms. Formats the text through format_text_for_typescript so embedded quotes survive.

wait_for_element

generate_wait_for_element_snippet (line 1969). Preserves the wait condition so the replay does not race past a modal that had not appeared yet on the original run.

run_command

navigate_browser / open_application

Dedicated generators plus verification code if verify_element_exists is set. The regenerated script waits for the expected element before moving on.

select_option / set_value / set_selected

Form controls get their own generators. set_selected is used for radio and checkbox states that do not fire correctly on a raw click.

invoke_element / activate_element / close_element

UIA pattern invocations. invoke() is more reliable than click for controls with zero-size bounds or offscreen layouts.

execute_browser_script

Regenerates the exact JavaScript that ran inside the page, wrapped in desktop.executeBrowserScript so the replay retains the same DOM access.

capture_screenshot / highlight_element / validate_element / stop_*

The rest of the 24 generators cover diagnostics and flow control. Anything without a generator falls through to a commented-out JSON block so the file still parses.

The same job, two scripting models

Side by side, a hand-rolled PowerShell automation and the regenerated Terminator replay for the same Word-save task. One encodes a process name and a keystroke sequence. The other encodes accessibility-tree selectors and verification postconditions. The second one is the file that got written to disk while the AI agent was finishing the job.

Hand-typed automation vs. runtime-transcribed automation

# windows_automation.ps1 — a hand-written automation
# this is what every other article tells you to type.
# it encodes coordinates, keystrokes, and process handles.
#
# hope the window is in the same place next week.

Add-Type -AssemblyName System.Windows.Forms

$winword = Start-Process "winword.exe" -PassThru
Start-Sleep -Seconds 2

# SendWait is the usual answer for "type into an app".
# it races the application's input handler and sometimes
# arrives in the wrong field.
[System.Windows.Forms.SendKeys]::SendWait("Q1 invoice ingest")
[System.Windows.Forms.SendKeys]::SendWait("^s")

Start-Sleep -Seconds 2

# and now we are hoping the save dialog has focus.
# and we are hoping the file-name field is the first focusable.
# and we are hoping the "Save" button is still named "Save".
[System.Windows.Forms.SendKeys]::SendWait("q1_invoices{ENTER}")

-100% lines the runtime wrote for you

The five-step adoption path

Install the MCP agent in your editor

One line for Claude Code, VS Code, Cursor, or Windsurf. The MCP server spins up a Windows UIA engine on stdio. execution_logger.rs::init() (line 106) creates %LOCALAPPDATA%\mediar\executions\ and sweeps anything older than 7 days.

claude mcp add terminator "npx -y terminator-mcp-agent@latest"

Ask the agent to finish a job

Tell Claude Code or Cursor what you want done on the desktop. The agent calls MCP tools against the running Windows UIA tree. You watch it work. You are not writing a script yet.

Let log_request + log_response run on every tool call

log_request (line 211) captures the tool name, arguments, workflow context, and a timestamp. log_response (line 242) extracts screenshots, strips the base64 payload from the result, writes the .json log, calls generate_typescript_snippet, and writes the .ts replay file.

Open %LOCALAPPDATA%\mediar\executions\

You now have one .json, one .ts, and one or two .png files for every single tool call the agent made, timestamped to the millisecond. Diff them against the PowerShell macro you would have hand-written. The .ts file is already runnable through the SDK.

Stitch the .ts files into a reusable workflow

Concatenate the generated snippets (or import them into @mediar-ai/workflow). The retries wrapper and verify_element_exists guard are already in place. The selectors are already bound to role + name, not to pixels. The replay is a real script, handed to you by the runtime.

A concrete guarantee, expressed as numbers

Everything above is deterministic. It is not a marketing promise, it is what the source file does for every MCP tool call.

per-tool snippet generators

files written per tool call

0 days

default retention window

action tools that auto-append verification code

click_elementtype_into_elementwait_for_elementrun_commandnavigate_browseropen_applicationcapture_screenshothighlight_elementvalidate_elementinvoke_elementset_selectedactivate_elementclose_elementscroll_elementselect_optionset_valueglobal_keymouse_dragexecute_browser_scriptstop_highlightingget_window_treedelaygemini_computer_usestop_execution

Tool names routed through the match block at execution_logger.rs:684

Why the replay survives a UI redesign

The regenerated .ts file is not a coordinate dump. Each locator string is assembled by build_locator_string (line 799) out of the process, window_selector, and selector fields from the original tool call, joined with the Terminator >> descendant combinator.

For fallback selectors, line 824 has a sibling function, build_fallback_locator_string, that converts the pipe-separated MCP shorthand (role:Button|name:Log On) into the SDK's && AND operator. The replay file always has at least one UIA-grounded path to the element, and often a backup.

A coordinate-based macro breaks when the dialog moves. A role-plus-name locator does not care. That property is baked into every regenerated snippet because it was baked into the tool call the agent made when it was solving the problem live.

A quick note on the sibling module

Terminator has a second script-capture module, the workflow recorder in crates/terminator-workflow-recorder. It is not the same thing. That module watches a human perform a task on a live Windows desktop and emits 14 high-level semantic events, one per action. The execution logger covered on this page watches an AI agent perform a task through MCP tools and emits replayable TypeScript. They compose. You can record an expert doing the job once, replay it, and let execution logging capture every subsequent tweak as a fresh set of .ts snippets on disk.

Want the replay files landing on your own desktop?

Spin up the MCP agent with your team, run a real workflow, and walk away with a folder of regenerated TypeScript you can check into your repo.

Frequently asked questions

Why does Terminator write scripts for me instead of letting me write them in PowerShell?

Because scripting a Windows UI by hand bottoms out on the same problem every time. Your PowerShell or AutoHotkey script encodes a coordinate, a keystroke, or a window handle that was true the day you wrote it. Then the UI reflows or the dialog moves and the script breaks. Terminator inverts the loop. You ask an AI coding agent to finish a job, the agent talks to the MCP server at crates/terminator-mcp-agent, and every tool call (click_element, type_into_element, wait_for_element, run_command, and 20 more) gets transcribed by execution_logger.rs into a paired JSON log and a replayable TypeScript snippet. The snippet is shaped against the accessibility tree, not against pixel coordinates, so the replay survives a UI redesign the same way the live agent did.

Where does the transcribed script actually land on disk?

Under %LOCALAPPDATA%\mediar\executions\ for standalone tool calls, or %LOCALAPPDATA%\mediar\workflows\{workflow_id}\executions\ when the call is part of a named workflow. The helper functions are get_executions_dir() and get_workflow_executions_dir() at lines 79 and 88 of execution_logger.rs. Each execution produces three files with a shared prefix built from {date_time}_{workflow_id}_{step}_{tool_name}: a .json file with the full request, the response, the duration in milliseconds, and any captured console logs, a .ts file with the regenerated TypeScript snippet, and one or more PNG files pulled out of the result payload by extract_and_save_screenshots() (line 449). Retention is 7 days, controlled by RETENTION_DAYS = 7 at line 19, with automatic cleanup.

How does the runtime know how to turn a click tool call into TypeScript?

It has 24 per-tool snippet generators routed from a single match block at line 684 of execution_logger.rs. click_element goes through generate_click_snippet (line 1042), which branches on whether the tool was called with coordinates, an index from get_window_tree, or a selector. type_into_element goes through generate_type_snippet (line 1187), which reads text_to_type, clear_before_typing, and timeout_ms, then formats the text for safe TypeScript embedding. wait_for_element has generate_wait_for_element_snippet (line 1969). run_command has its own generator that preserves the engine and shell arguments so the same polyglot step replays identically. Tools without a generator fall through to a commented-out JSON block so the file still parses.

Does the regenerated script retry on failure or verify the result?

Both, when the original call asked for it. If the MCP tool was called with retries > 0, the generator at line 751 wraps the snippet in a retry loop that catches, sleeps 500ms, and rethrows on the last attempt. If the call was an action tool (click_element, type_into_element, press_key, scroll_element, select_option, set_value, set_selected, invoke_element, activate_element, navigate_browser, open_application, press_key_global) and had a verify_element_exists or verify_element_not_exists argument, generate_verification_code at line 851 appends a polling verification block with configurable verify_timeout_ms (default 2000). A successful run gets a // Status: SUCCESS comment. An error gets // Status: ERROR followed by the error message, which makes the .ts file usable as both a replay script and a historical trace.

What does a real file name look like so I can find one on my machine?

Look in %LOCALAPPDATA%\mediar\executions\ after running any MCP tool call. The prefix is built by generate_file_prefix() (line 145): {YYYY-MM-DD_HH-MM-SS-ms}_{workflow_id}_{step_id}_{tool_name_without_mcp_prefix}. A real example from a standalone click run is 2026-04-18_14-22-07-931_standalone_full_click_element.json next to 2026-04-18_14-22-07-931_standalone_full_click_element.ts and the same prefix ending in _before.png and _after.png. The mcp__terminator-mcp-agent__ prefix is stripped by strip_prefix() at line 199, so the final file name is readable.

Can I disable logging for noisy runs or on a locked-down machine?

Yes. The LOGGING_ENABLED static at line 16 defaults to true, and is_enabled() reads it. The log_request and log_response functions check is_enabled() before writing anything, so toggling it off via configuration or env var is a single branch. The init() function at line 106 also runs the retention sweep on startup, so turning logging off stops both new writes and the 7-day cleanup for that process.

Is this different from the workflow recorder that ships with Terminator?

Yes, different module, different purpose. The workflow recorder (crates/terminator-workflow-recorder) watches a human perform a task and emits 14 high-level semantic events (ClickEvent, TextInputCompletedEvent, FileOpenedEvent, ApplicationSwitchEvent, and so on). The execution logger (crates/terminator-mcp-agent/src/execution_logger.rs) is the other direction: it watches an AI agent perform a task through MCP tools and emits replayable TypeScript snippets plus before/after screenshots. Use the recorder to capture an expert doing the job once. Use the execution logger when you want the AI to do the job and hand you the script afterward.