You don't write a Windows automation script. You record one.

Every other guide on this topic opens an editor and starts typing PowerShell, AutoHotkey, or AutoIt. Terminator inverts the order. You perform the workflow once on your real desktop. The recorder serializes fourteen high-level event types, each one tied to an accessibility-tree element rather than a pixel coordinate. The JSON that comes back is the script. Replay runs against UI Automation, not against screen geometry, so the recording survives a UI redesign that would shred a key-and-pixel macro.

M
Matthew Diakonov
10 min read
4.9from design partners running recorded workflows in production
14 WorkflowEvent variants in events.rs:475
3,581 lines of Windows recorder in windows/mod.rs
JSON replays through the same MCP agent that ships with Terminator

The thing every other write-up skips

Most writing on this topic teaches you a syntax. PowerShell for services, AutoHotkey for hotkeys, AutoIt for window control. They are all useful for what they are. None of them addresses the part of the job that gets expensive in production: the script that worked yesterday breaks today because a button moved. A recorded macro that knows the Save button as "the click at coordinate (612, 78)" has nothing to fall back on.

Terminator's workflow recorder solves that by recording at a higher altitude. The unit of capture is not a key event. It is a tagged variant of an enum named WorkflowEvent, defined as fourteen discrete event types in one file. The most useful seven of those are high-level semantic events: ClickEvent with an interaction_type that splits into Click, Toggle, DropdownToggle, Submit, Cancel. TextInputCompletedEvent that aggregates a whole typing session into a single text_value with keystroke_count and typing_duration_ms. FileOpenedEvent that watches window titles and resolves the file path on disk with a confidence score. ApplicationSwitchEvent with a switch_method enum so replay can pick a different method if Alt+Tab is unavailable. The replay engine consumes those structs and walks the accessibility tree, not the screen.

The same task, recorded two different ways

; autohotkey_v2.ahk ; recorded with the AHK macro recorder ; or hand-written by a sysadmin in 2008. ; Run "EXCEL.EXE" Sleep 1500 WinActivate "Excel" Click 612, 78 ; toolbar Save button Sleep 200 Send "{Tab 3}" ; navigate to file name field Send "Q1 invoice ingest" Sleep 100 Click 1054, 619 ; Save button on the Save dialog ; ; brittle. layout shifts ten pixels? ; pinned tab? high DPI? broken.

  • Hard-coded screen coordinates (612, 78) and (1054, 619)
  • Replay assumes the dialog never moves
  • Localization breaks the script (different text)
  • High-DPI changes shift the entire layout

The fourteen variants, and what is in each

Pulled from the WorkflowEvent enum at crates/terminator-workflow-recorder/src/events.rs line 475. Read the file directly if you want the field-level definitions. Each event carries an EventMetadata block with the originating UI element and a timestamp. The high-level events live above the low-level ones in the same enum, which is what makes the JSON replayable instead of just watchable.

Click

ClickEvent at events.rs:345. element_text, element_role, was_enabled, child_text_content, plus an interaction_type that splits into 5 variants.

TextInputCompleted

Aggregates a typing session. text_value, keystroke_count, typing_duration_ms, input_method (Typed | Pasted | AutoFilled | Suggestion | Mixed).

FileOpened

Watches window titles, resolves the file path. primary_path, candidate_paths sorted by LastAccessTime, confidence, search_time_ms.

ApplicationSwitch

switch_method covers AltTab, TaskbarClick, WindowsKeyShortcut, StartMenu, WindowClick, Other. dwell_time_ms records how long you stayed in the previous app.

BrowserTabNavigation

from_url, to_url, from_title, to_title, page_dwell_time_ms. Distinguishes KeyboardShortcut, TabClick, NewTabButton, AddressBar, LinkNewTab.

Hotkey

Records the chord (Ctrl+S, Win+L) plus the action that resulted, when the system can name it.

Clipboard

Cut, copy, paste, with content preview. The replay can branch on what was on the clipboard at record time.

TextSelection

Length, content preview, selection_method. Useful for replays that need to reselect a region before acting on it.

DragDrop

start_position, end_position, plus the source UI element under the cursor when the drag began.

Mouse / Keyboard / PendingAction

Low-level events kept available for diagnostics, but high-level events take precedence when both fire on the same input.

What you see when you start the recorder

The example binary in the workflow-recorder crate prints every event to stdout as it lands. This is the live output during a 20-second capture of a Word-then-Excel workflow. Notice how each line is a named event with structured fields, not a raw key code or pixel pair.

cargo run --example record_workflow

The pipeline, end to end

Your hands move. The OS fires UI Automation events. The recorder normalizes them into one of fourteen WorkflowEvent variants and attaches the surrounding UI element metadata. The events stream into a SerializableWorkflowEvent JSON file. From there, three replay surfaces share the same selector resolver: the CLI, the typed workflow SDK, and the MCP agent. The MCP agent is the one Claude Code calls when it needs to patch a stale selector mid-replay.

From hand to JSON to replay

Your hands on the keyboard
WinUI Automation events
Recorder normalization
14 WorkflowEvent variants
recorded.json
@mediar-ai/workflow
terminator-mcp-agent
Claude Code recovery

What replay actually does for one click

01 / 05

1. The user clicks Save in Excel

UI Automation fires an Invoke event on a Button element with name="Save" and role="Button" inside process EXCEL.EXE.

The numbers that hold this up

Every claim on this page lands in a specific source file. The numbers below are line counts, variant counts, and timing constants pulled from the Terminator repo as of this writing.

0WorkflowEvent variants
0ButtonInteractionType variants
0ApplicationSwitchMethod variants
0Lines in windows/mod.rs

The single source line everything else is built on

This is the enum that defines what a recorded Windows automation script can be. Open the file at line 475 to see the same thing on disk. The high-level variants (Click, TextInputCompleted, FileOpened, ApplicationSwitch, BrowserTabNavigation) are the ones that make replay robust. The low-level variants stay around because some workflows really do need raw mouse coordinates.

crates/terminator-workflow-recorder/src/events.rs

One event at a glance

Here is what a single TextInputCompleted event looks like in the recorded JSON. Compare the bottom of the file: a coordinate-based recorder for the same action would have nothing to say about the field name, the typing duration, or the input method.

recorded.json

The five steps from hand to script

1

Run the recorder

cargo run --example record_workflow inside the Terminator repo. The recorder paints a green outline around every UI element it captures, so you can see in real time what was tied to each event.

2

Perform the workflow once

Click the buttons, type the values, drag the files, switch the apps. Every action lands as a high-level event with the surrounding UI metadata. Recording wraps after 20 seconds in the example binary; programmatic use lets you stop on a hotkey or a custom condition.

3

Inspect the JSON

Each event is a tagged variant with a full metadata block. The metadata.ui_element field contains the role, name, AutomationId, application name, and process id of the element that received the action. That is the part the replay engine matches on, not pixel coordinates.

4

Hand it to the replay engine

Either run it directly through @mediar-ai/cli (terminator mcp run recorded.yml) or load it into @mediar-ai/workflow as a typed sequence. Both paths share the same selector resolver and both can call into the MCP agent for recovery when an element has moved.

5

Patch with the LLM, do not re-record

When a replay step fails because a button was renamed, the MCP loop dumps the fresh window tree to Claude Code and asks for a new selector. The patched step runs, the rest of the script continues. This is the loop that lets a recorded workflow survive a UI redesign without being recaptured from scratch.

Replaying the JSON

The replay surface is a typed runner. You can override how each event type resolves to a selector or a low-level action. The recovery hook (onMissingElement: "patch_with_llm") is the loop that lets a stale selector be repaired by Claude Code at runtime, one event at a time, instead of forcing you to record the workflow again from scratch.

replay.ts

Type names you will run into

A short tour of the type vocabulary. All declared in terminator-workflow-recorder/src/events.rs. The fourteen-variant outer enum is the spine; the inner enums (interaction type, switch method, tab navigation method, text input method) are how the recorder classifies what the user actually meant.

WorkflowEvent (14 variants)
ButtonInteractionType (5)
ApplicationSwitchMethod (6)
TabNavigationMethod (6)
TextInputMethod (5)
FilePathConfidence
FieldFocusMethod
SerializableWorkflowEvent
PendingActionType (3)
MouseEventType
EventMetadata
UIElement (role, name, AutomationId)
14

events.rs ships 14 WorkflowEvent variants. windows/mod.rs is 3,581 lines. Both numbers are checkable in the public Terminator repo.

crates/terminator-workflow-recorder/src/

Why this matters for AI coding assistants

A recorded script with semantic events is cheap for a model to repair. The model only needs to see one event's context plus the current accessibility tree to propose a new selector. A keystroke-and-pixel macro is expensive to repair: the model has to re-derive the entire intent from a sequence of low-level signals. That is the practical reason Terminator's recorder lifts the capture into 0 high-level event types in the first place. It is not for human readability. It is so the recovery loop has something to work with.

The same MCP agent that records the workflow can replay it, patch it, and run it on a different machine. One npx install, then the full loop fits inside Claude Code, Cursor, VS Code, or Windsurf.

Want to see a recorded workflow replay against your own desktop?

Record one of your team's workflows on a call. We replay it through the MCP agent and walk through the recovery loop live.

Frequently asked questions

Why does Terminator emit 14 event types when AutoHotkey only records keystrokes and mouse moves?

Because keystrokes and mouse moves are the wrong abstraction for replay. They tell you what the user did with their hands. They do not tell you what the user did with the application. Terminator's recorder lifts both into 14 high-level WorkflowEvent variants (events.rs line 475): Mouse, Keyboard, Clipboard, TextSelection, DragDrop, Hotkey, TextInputCompleted, ApplicationSwitch, BrowserTabNavigation, Click, BrowserClick, BrowserTextInput, FileOpened, PendingAction. The interesting ones are the high-level kind. A 'Click' is not a coordinate, it is a ClickEvent struct (events.rs line 345) with element_text, element_role, was_enabled, an interaction_type discriminator (Click, Toggle, DropdownToggle, Submit, Cancel), and a metadata pointer to the actual UIA element. Replay does not need to know where the Save button was on screen yesterday. It only needs to find a button whose accessible name is Save in the same window.

What does TextInputCompleted give me that recording every keystroke does not?

Aggregation. Recording one keystroke at a time produces a script that types K-E-Y-D-O-W-N K-E-Y-D-O-W-N K-E-Y-D-O-W-N when what actually happened semantically was 'the user typed hello into the search box.' TextInputCompletedEvent (events.rs line 977) collapses a typing session into a single event with text_value (the final string), keystroke_count (how many physical keys were pressed), typing_duration_ms (how long the typing took), input_method (Typed, Pasted, AutoFilled, Suggestion, Mixed), focus_method (how the field got focus before input started), field_name, field_type (TextBox, PasswordBox, SearchBox), and a process_name. Replay sets the text directly through UI Automation. The replay does not retype every key, which means it does not race the application's input handler.

How does the recorder know a file was opened?

It watches window titles and searches the file system. FileOpenedEvent (events.rs line 1155) carries the filename extracted from the window title, a primary_path (the most likely file location), candidate_paths sorted by NTFS LastAccessTime, a confidence enum (FilePathConfidence), the application_name, the process_id, the file_extension, the search_time_ms (how long the lookup took, in milliseconds), and the full window_title the filename was extracted from. So opening todolist-backup.txt in Notepad becomes a FileOpened event with the resolved path, not a Win+R run dialog macro that has to recreate the navigation by hand.

How is an ApplicationSwitch event different from just recording Alt+Tab?

Six switch methods, not one. ApplicationSwitchMethod (events.rs line 1003) discriminates AltTab, TaskbarClick, WindowsKeyShortcut, StartMenu, WindowClick, and Other. ApplicationSwitchEvent (line 1020) records from_window_and_application_name, to_window_and_application_name, from_process_name, to_process_name, from_process_id, to_process_id, switch_method, dwell_time_ms (how long you stayed in the previous app), and switch_count for rapid Alt+Tab cycling. On replay you can switch via the same method, or pick a different one, because the script knows what was being switched away from and what was being switched to as named entities. Recording 'Alt key down, Tab key down, Tab key up, Alt key up' tells the replay engine nothing about which window you wanted to land in.

How do I run the recorder?

It ships as an example binary in the workflow-recorder crate. From the Terminator repo: `cargo run --example record_workflow`. The example sets enable_highlighting=true, highlight_color=0x00FF00 (green BGR), highlight_duration_ms=800, and prints each event to stdout as it lands. Recording runs for 20 seconds by default. The default `WorkflowRecorderConfig` records mouse, keyboard, clipboard, hotkeys, text-input completion, application switches, browser navigation, file opens, and UI focus/property changes. You can flip individual capture flags off if a workflow does not need them. The Windows recorder source itself lives at crates/terminator-workflow-recorder/src/recorder/windows/mod.rs and is currently 3,581 lines.

What format is the saved script in, and how do I replay it?

JSON. Each event has a Serializable* counterpart that strips runtime types and writes a flat schema (SerializableMouseEvent, SerializableKeyboardEvent, SerializableTextInputCompletedEvent, etc., declared at events.rs lines 1360 through 1690). The full stream is a SerializableWorkflowEvent enum (line 1677). To replay, hand the JSON to a workflow runner: `terminator mcp run recorded.yml` from @mediar-ai/cli, or load the events into the @mediar-ai/workflow SDK and convert each event into a step. Selectors are derived from the metadata.ui_element field on each event (role, name, AutomationId, application name) so the replay does not depend on screen geometry.

What happens when the UI changes between recording and replay?

The script tries the recorded selector. If the element is gone or the name changed, the MCP loop calls get_window_tree on the current window, hands the fresh accessibility-tree JSON to the LLM, and asks for a replacement selector. The retry runs the patched step, then continues the rest of the script. This is why the script being a bag of semantic events matters: the LLM only has to repair one event at a time, not rewrite a coordinate-based macro from scratch. A keystroke-and-pixel recorder cannot offer that recovery loop because there is no semantic name to match against.

Can the recorder distinguish typed input from pasted input from autofill?

Yes. TextInputMethod (events.rs line 948) is a five-variant enum: Typed (each character came from a key event), Pasted (clipboard paste), AutoFilled (the field arrived populated by the platform), Suggestion (the user accepted an inline suggestion), Mixed (some combination). FieldFocusMethod is recorded separately. The replay engine uses the input_method to pick a strategy: Typed reproduces the keystrokes, Pasted writes to the clipboard and triggers Ctrl+V, AutoFilled skips the input entirely if the application restores it on next load. This level of intent classification is what makes replay survive small UI changes that would tear a key-by-key macro apart.

Does this only work on Windows?

The recorder ships full coverage on Windows because it sits on top of Microsoft UI Automation, which is the most complete accessibility stack of the three platforms Terminator supports. The Windows-specific recorder is `crates/terminator-workflow-recorder/src/recorder/windows/mod.rs` (3,581 lines). The supporting structs are in `windows/structs.rs` (672 lines). macOS and Linux replay paths exist for the action side of Terminator (the click, type, locate primitives), but the recorder is Windows-first today. If you need cross-platform recording, the recommended path is to author workflows in TypeScript with @mediar-ai/workflow and run them through the MCP agent, which is what most teams do anyway.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.