The ui testing automation tools recorder most listicles never describe
Every roundup of ui testing automation tools says the same thing about recorders. "It captures your actions and plays them back." Nobody shows the format. Nobody describes the data shape. The recorder output is treated as an opaque artifact, which is why replay is brittle and why tests built from recordings break when the UI shifts a pixel. Terminator's recorder emits a typed, open, 14-variant event stream defined in crates/terminator-workflow-recorder/src/events.rs. This page is a walkthrough of that enum and why the variant choices matter.
Recorders are where ui testing automation tools quietly differ
Open any 2026 listicle. You will see the same phrase applied to a dozen products: "records your actions and replays them." The sentence is correct. It is also useless. Every recorder records something. What separates the brittle ones from the durable ones is the shape of what ends up on disk. If a recorder saves a list of mouse coordinates, your test fails when a window opens at a slightly different position. If it saves a stream of keydown events, your test cannot tell a typed value from a pasted one. If it saves only what is visible on screen, it misses the fact that the user switched applications with Alt+Tab versus with a taskbar click.
Terminator's recorder writes out something different: a stream of semantic events. An Alt+Tab is an ApplicationSwitch with switch_method=AltTab. A pasted email address is a TextInputCompleted with input_method=Pasted and the full text value in one field. A double-click on an Excel file is a FileOpened with a ranked candidate path list.
The 14-variant WorkflowEvent enum, verbatim
This is the actual enum. The first six are low-level raw events (disabled by default in most recording configs, kept for edge cases). The last eight are high-level semantic events. Every one of them is a struct with explicit fields, not a blob of bytes.
Fourteen variants, one per class of intent
You never produce these yourself. The recorder emits them. But the set is the vocabulary your test code will read when you consume a recording, so it helps to see all fourteen at once.
What each semantic variant actually tells you
Fourteen variants collapse into a small number of test-authoring decisions. Each card below is one variant (or group) and the field shape it gives you.
TextInputCompleted
Fires once per field after the user stops typing for ~500ms. Stores text_value, field_name, field_type, typing_duration_ms, keystroke_count, and the input_method from {Typed, Pasted, AutoFilled, Suggestion, Mixed}. This is the single most useful event in the enum for test authoring.
ApplicationSwitch
Records a focus change between two processes. Includes from_process_name, to_process_name, switch_method ({AltTab, TaskbarClick, WindowsKeyShortcut, StartMenu, WindowClick, Other}), and dwell_time_ms in the previous app. Six methods, not one.
FileOpened
Fires when a new window title appears to reference a file. The recorder searches recent-access paths, ranks by LastAccessTime, and emits a FilePathConfidence of High, Medium, or Low. Your test gets a real path, not a title fragment.
BrowserTabNavigation
Chrome-extension-bridged event with to_url, from_url, to_title, from_title, browser, tab_index, total_tabs, is_back_forward. Action is one of {Created, Switched, Closed, Moved, Duplicated, Pinned, Refreshed}. Method tracks how the navigation happened.
Click and BrowserClick
Two click variants. Click uses accessibility role plus name. BrowserClick additionally carries DomElementInfo with the CSS path and up to 5 ranked SelectorCandidate entries, so the downstream replay can pick the selector that actually survived the last page update.
BrowserTextInput
DOM-aware text input. Emitted via the extension bridge for inputs inside a browser tab. Carries the field's DomElementInfo so replay does not need to walk the accessibility tree, which is lossy for web forms.
Hotkey, Clipboard, TextSelection, DragDrop
The three semantic layer-helpers. Hotkey pattern matches against a small known-shortcut list (save, copy, close tab, etc.). Clipboard stores a hash plus action. TextSelection records the selected substring. DragDrop captures the source, target, and payload.
Mouse, Keyboard, PendingAction
Low-level leftovers. Mouse and Keyboard are the raw event streams (disabled by default in the config). PendingAction is an internal bookkeeping event emitted right before a capture completes, so consumers can block on UI-tree refresh before reading the next event.
The single most useful event: TextInputCompleted
Most of the interesting information in a UI workflow is: what did the user put in which field, and how did it get there? The event below captures that in one struct per field session. The input_method is the distinguishing field. Five values. No other mainstream recorder I know of exposes this.
How the recorder tells Typed from Pasted
The detection is timing and keystroke arithmetic, not a heuristic based on field type. Walk through what the recorder does when a user pastes.
Classifying a 'finance@acme.com' entry into TextInputMethod
Stage 1: Field focus
FileOpened: window title to ranked path list
When a user opens a document, the window title usually shows the filename, not the full path. Every recorder I have seen before Terminator just saves the title verbatim. This one tries harder. The struct below is what you get.
From a recorded event to a replayable MCP tool step
Each variant of WorkflowEvent translates to an McpToolStep with a tool name, arguments, and an optional expected-change diff. That is how a recording becomes a replayable test, and how an LLM can read and rewrite the same file.
recorded event to replay step
Six stages from a user click to a JSON event
The recorder is ~4,000 lines of Windows-specific plumbing in crates/terminator-workflow-recorder/src/recorder/windows. Each stage below corresponds to real code in that directory.
Event capture in a dedicated thread
The Windows recorder runs a UI Automation event subscription on a background thread. Every focus change, click, and text-change notification lands in a bounded channel. A second thread owns clipboard polling. A third owns the browser-extension bridge for Chrome-specific signals.
Semantic aggregation, not stream dumping
Instead of saving each event raw, the aggregator maintains small state machines. For text input, it holds an InputTextAccumulator per focused element that tracks keystroke_count, start_time, and whether the user has been idle long enough to emit. For application switching, it holds an ApplicationState with a start-time stamp so dwell_time_ms comes out correct.
Method detection via timing and modifiers
Paste detection is timing-based: if the text length jumps by more than N characters in under M milliseconds without matching keystroke count, the event is classified as Pasted. Suggestion is detected by an autocomplete-list click interacting with the focused field. AutoFilled is inferred from text appearing without any keystroke burst at all.
File-path resolution against the OS
When a new window title is detected, the recorder parses a filename candidate out of it (handles 'file.txt - App', 'file.txt * - App', 'App - file.txt', and similar). Then it walks the filesystem's recent-access index, collects matches, and ranks them by LastAccessTime to assign FilePathConfidence.
Write out as SerializableWorkflowEvent
At recording end, each live event is converted to its Serializable counterpart (UIElement becomes SerializableUIElement, timestamps stay as u64 millis, enums become string tags). The whole workflow serializes through serde_json::to_string_pretty. The resulting .json file is the recording.
Replay as MCP tool calls, with oracles
On replay, each event maps to an McpToolStep with a tool_name, arguments, and optional expected_ui_changes / expected_dom_changes fields. A test runner can call each step, diff the UI after each action, and fail with a structured reason instead of a silent 'element not found'.
What a recording actually looks like on disk
A real workflow JSON. Six events cover what in a keystroke-based recorder would be a few hundred. Every field here is one line in the enum definition you saw above.
Running the recorder, live
The CLI prints each semantic event as it is committed to the log. Read this output and the matching JSON above side by side. One line of terminal, one object in the file.
The method tags, all in one place
Each tag below is a string value that shows up in a recording. If your replay logic needs to handle an "autofilled email" path differently from a "typed email" path, the branch key is event.input_method.
Terminator’s recorder vs a keystroke-dump recorder
Ten differences. The left column is the shape of the recording produced by the typical ui testing automation tools recorder (Selenium IDE, vendor-specific RPA tools, most browser codegen). The right column is what Terminator's workflow recorder produces.
| Feature | Keystroke-dump recorder | Terminator |
|---|---|---|
| Stores 'user pasted john@example.com' as ONE event | Stores it as ~20 keydown/keyup pairs plus a clipboard event. No semantic link between them. | One TextInputCompleted event. input_method=Pasted. text_value is the full string. |
| Distinguishes typed input from paste from autofill | No. A paste and a fast type look identical in a keystroke log. | Yes. TextInputMethod has 5 variants: Typed, Pasted, AutoFilled, Suggestion, Mixed. |
| Captures how the user switched applications | Usually not recorded at all. A focus change is inferred from the next click location. | ApplicationSwitchMethod records AltTab, TaskbarClick, WindowsKeyShortcut, StartMenu, WindowClick, Other. |
| Detects a browser tab switch (not a page load) | No. Browser tab state is invisible to OS-level recorders. | Chrome extension bridge emits BrowserTabNavigation with to_url, from_url, method, is_back_forward. |
| Resolves the actual file path of an opened document | No. Window title is saved as-is. If the title is 'Q2-invoices.xlsx - Excel', the full path is lost. | FileOpenedEvent searches recent-access paths, emits primary_path plus ranked candidate_paths. |
| Records how long the user spent in a field | Derivable from keydown timestamps, not stored as a single value. | TextInputCompletedEvent.typing_duration_ms is one field per completion. |
| Records time spent in previous application (dwell) | No. | ApplicationSwitchEvent.dwell_time_ms is one field per switch. |
| Output is replayable as typed MCP tool calls | Replay requires the same OS, same resolution, often the same screen layout. | Each recorded event maps to an McpToolStep (tool_name, arguments, description) that any MCP client can run. |
| Expected UI change stored alongside the action | No. | McpToolStep.expected_ui_changes is a tree diff snapshot, used as a validation oracle on replay. |
| Output format is a typed Rust/TypeScript schema | Usually a proprietary binary or a screenshot reel. | SerializableWorkflowEvent is a serde enum. Full JSON schema is derivable from the source. |
WorkflowEvent variants in events.rs lines 475-517
Eight are high-level semantic events. Six are low-level raw events. Together they cover every intent a user can express at a running OS.
“Every enum variant, method name, and struct field on this page is grep-able in a fresh clone of mediar-ai/terminator. The 14 count is not marketing. It is the number of arms in pub enum WorkflowEvent in events.rs lines 475-517.”
github.com/mediar-ai/terminator
Why the recording format decides everything
Tests built from recordings fail for one of three reasons: the UI shifted, the input method changed, or the application context changed. A recording format that only stores mouse coordinates loses to all three. A format that stores raw keystrokes loses to input-method and context changes. A semantic format captures enough invariants at record time that the replay can adapt.
Terminator's recorder is not a replacement for your test runner. It is a way to generate the first draft of a test from a real user flow, in a format that reads well enough to hand-edit and that replays across machines with non-identical screen geometry. You install it with cargo install terminator-workflow-recorder or drive it from the MCP server the same repo ships.
If you are evaluating ui testing automation tools for an app that is not purely web, ask the vendor for a sample recording file. If the answer is "it is binary" or "it is a screenshot reel," you are about to buy a brittle recorder. Terminator's answer is a readable, typed JSON with fourteen variants defined in one open-source file. That is the spec.
Have a workflow you want recorded and replayed across apps?
Walk us through the flow on a call. We will point at the matching WorkflowEvent variants and sketch the replay path end to end.
Frequently asked questions
What does Terminator record that mainstream ui testing automation tools recorders do not?
A semantic event stream instead of a raw input stream. When a user pastes an email address into a To: field, Selenium IDE or a low-code browser recorder saves a clipboard paste plus a focus change plus a change event. Terminator saves one TextInputCompleted event with text_value='finance@acme.com', input_method=Pasted, keystroke_count=0, typing_duration_ms=720, field_name='To'. The rest of the state machine is in the recorder, not the log file. This is how the recording stays readable when the workflow is thirty actions long.
Where is the 14-variant WorkflowEvent enum?
In the open-source Terminator repo at crates/terminator-workflow-recorder/src/events.rs, lines 475 to 517. Clone github.com/mediar-ai/terminator and grep for 'pub enum WorkflowEvent'. The fourteen variants are: Mouse, Keyboard, Clipboard, TextSelection, DragDrop, Hotkey, TextInputCompleted, ApplicationSwitch, BrowserTabNavigation, Click, BrowserClick, BrowserTextInput, FileOpened, PendingAction. Six are low-level (raw input, typically disabled in production configs). Eight are high-level semantic events. This is the surface area you build tests against.
How does the recorder tell Typed from Pasted from AutoFilled?
Timing and keystroke arithmetic. The Windows recorder keeps an InputTextAccumulator per focused element. Every key press increments keystroke_count. Every change-event on the element updates the observed text. If the text length jumps by many characters in a window where almost no keystrokes fired, the event is classified as Pasted. If text appears with zero keystrokes and no paste timing, it is AutoFilled. If the user clicks an autocomplete dropdown item that commits a value into the field, it is Suggestion. If more than one of those paths triggered inside the same field session, it is Mixed. Typed is the default. You can see the completion logic in crates/terminator-workflow-recorder/src/recorder/windows/structs.rs around line 200.
What is the FileOpened event for? Is it a hook into the filesystem?
It is not a filesystem hook. It is a window-title heuristic followed by a filesystem lookup. When a new window becomes foreground and the title looks like it contains a filename (patterns like 'name.ext - AppName' or 'AppName - name.ext'), the recorder searches the OS recent-access index for files with that name. The results are ranked by LastAccessTime and returned as candidate_paths. If one clear winner emerges, confidence is High. If multiple files compete but one is clearly the most recent, it is Medium. If the access times are ambiguous, it is Low. Your downstream tooling sees a typed confidence level, not a raw name string.
Can I replay a recording as a browser-only test?
Only if the recording was browser-only. If the recording crosses into a native app (the user opens Excel, the user hits Alt+Tab to Outlook), the replay has to also cross. That is why Terminator's runtime is desktop-native at the bottom and uses a Chrome extension bridge for DOM access at the top, in the same process. A recording that starts in Chrome, opens Excel, pastes a value, switches back, and clicks Send replays as one test file with one Desktop() instance.
How does the replay work on a machine where the UI has shifted slightly since the recording?
Two mechanisms. First, the recorded event has a UIElement with selector-relevant attributes (role, name, native id, class). The runtime re-resolves a selector against the live accessibility tree every time, so small coordinate shifts do not matter. Second, McpToolStep stores expected_ui_changes as a tree-diff snapshot, so after each action the runtime can verify the UI changed the way it did during recording. If the diff does not match, the step fails with a structured reason instead of a silent mismatch downstream.
Does this work on macOS and Linux, or only Windows?
The recorder is first-class on Windows today. The Windows implementation lives in crates/terminator-workflow-recorder/src/recorder/windows and is 3,500+ lines of UIA event plumbing. macOS support is in progress. Linux AT-SPI2 is experimental. If your target is cross-platform UI testing automation that includes native Windows apps, this is the right tool right now. If your team is macOS-first, the recorder side is less mature today, but the selector engine and locator API under crates/terminator already work across both platforms.
Is the recorder privacy-safe? Does it capture passwords?
The recorder respects field_type. When a field is classified as PasswordBox (or equivalent on macOS), the TextInputCompletedEvent is emitted with the keystroke_count and typing_duration_ms populated but text_value elided. Clipboard events can be length-capped via config (max_clipboard_content_length). Screenshot capture is optional, off by default, and has a configurable blur-on-sensitive-field mode. Recordings are local files by default; no telemetry leaves the machine unless you opt in.
How does this compare to Playwright's codegen?
Playwright's codegen is a browser-only recorder that emits Playwright-API code. It works well when every action is inside a Chromium or WebKit tab. Terminator's recorder is OS-wide. It records across native apps and browser tabs in a single session and emits events that replay as MCP tool calls, not just Playwright function calls. For testing a pure web app, Playwright's codegen is excellent and simpler. For testing a workflow that leaves the browser, Terminator is the answer, and the semantic event format keeps the recording readable at scale.
Can AI coding assistants consume a recording directly?
Yes. A Terminator recording is a JSON file of SerializableWorkflowEvent values. Because the recorder emits semantic events (not raw keystrokes), the file reads like an annotated transcript. Claude Code, Cursor, or any MCP-capable agent can ingest that file, map each event to the matching MCP tool, and execute the replay through the crates/terminator-mcp-agent server. This is why the recorder's output format matters: the event names and fields are the prompt surface that the LLM sees.