UI automation shaped like Selenium, recorded as semantic events instead of keystrokes

Selenium IDE listens to DOM events inside one browser tab. Terminator's workflow recorder sits on top of Windows UI Automation and emits typed, high-level events across every window on the OS. One "typed hello world in 1.2 seconds over 11 keystrokes" event replaces twenty-two keydown / keyup entries. Source: crates/terminator-workflow-recorder/src/events.rs.

Matthew Diakonov, Written with AI

Published April 22, 202611 min read

4.9from dozens of design partners

record_text_input_completion: true by default in recorder.rs line 203

TextInputCompletedEvent carries 8 typed fields in events.rs line 977

ApplicationSwitchMethod has 6 variants including AltTab and TaskbarClick

BrowserTabNavigationEvent covers 7 tab actions, 8 navigation methods

Selenium IDE, stretched past the tab

A recorder that emits semantic events, not keystrokes

Records every window on the OS, not one browser tab

TextInputCompletedEvent collapses 22 keydown events into 1

ApplicationSwitchEvent attributes each switch to AltTab, TaskbarClick, or a direct window click

Every event carries the role, name, id, and bounds of the UI element it targeted

Open source in crates/terminator-workflow-recorder

0:00 / 0:05

Every other guide on this topic stops at click() and send_keys()

If you have read three articles about Selenium this week, you know the shape. A driver, a By.id locator, a click, a send_keys. Maybe a Selenium Grid diagram if the writer was feeling generous. Maybe a paragraph about Selenium IDE, the Chrome extension that records clicks and plays them back.

The guides stop there because the API does. WebDriver speaks a protocol that browser engines implement. Inside a rendered DOM, it is precise and boring, the way testing infrastructure should be. Outside a rendered DOM, it is silent. A file upload dialog, a native menu bar, a taskbar, a Slack desktop app, an Excel cell: invisible.

This page is about the recorder most of those guides never cover: Terminator's terminator-workflow-recorder crate, which records UI events the same way Selenium IDE does, but one level deeper, on top of the OS accessibility API instead of the DOM. The events it emits are not keystrokes. They are typed, semantic, and carry the context an LLM or a replay engine needs to do something useful with them.

The anchor fact other guides miss

The default recorder config turns on three semantic event streams simultaneously. It is a one-line claim you can check against the source.

crates/terminator-workflow-recorder/src/recorder.rs

Three flags do the heavy lifting. record_text_input_completion debounces typing into one event per field per session. record_application_switches emits one event every time the foreground window changes, tagged with the method the user used. record_browser_tab_navigation watches every Chromium, Firefox, and Edge process for tab lifecycle changes and emits them as first-class events with URLs.

0high-level event streams on by default

0WorkflowEvent variants in events.rs

0lines in the event definition file

0ApplicationSwitchMethod variants

1 event / session

“Emitted after typing activity debounces, not per keystroke. Captures typing, pasting, and autofill as three different input methods.”

Tool contract on TextInputCompletedEvent in crates/terminator-workflow-recorder/src/events.rs at line 977

One event per typing session, not one per keystroke

Here is the full struct. Eight typed fields, each with a specific job. Read the comments and you can see the questions a replay engine can now answer without walking a raw event log.

crates/terminator-workflow-recorder/src/events.rs

Why this matters

A replay step that says "type hello world into Send" survives when an LLM later generates "hello Alice" instead. A replay step that says "keydown H, keydown E, keydown L..." breaks.

Why input_method matters

Passwords are almost always autofilled. URLs are almost always pasted. A recorder that conflates the three cannot tell a password manager event from a user typing, and cannot replay the correct one.

An app switch knows whether the user pressed Alt+Tab

Every time the foreground window changes, the recorder classifies how. That classification is what makes a replay faithful. Lifting a window with SetForegroundWindow looks fine in a screenshot; it misses every real path a user took.

crates/terminator-workflow-recorder/src/events.rs

A thirty-seven event session, recorded live

Here is what a short session looks like when the recorder streams its output. A user switched to Slack via Alt+Tab, clicked the new-message button, typed a short message, submitted with Ctrl+Enter, switched to Chrome via the taskbar, and navigated to a dashboard URL. Eight human actions, nine events.

terminator record

How a raw user action becomes a typed event

How the pipeline actually works

Five passes happen on every event before it leaves the recorder. Keeping them separate is what makes the final event stream worth reading.

Low-level input arrives

The recorder subscribes to Windows UIA's global event streams and the raw mouse and keyboard hooks. Every click, keydown, and focus change is received.

UI element context is attached

For every event, the recorder walks the UIA tree once to capture the target element's role, name, id, bounds, and process. That data is stapled to the event as EventMetadata.

Semantic debouncing kicks in

A typing session on a TextBox gets buffered. When the activity debounce fires, the recorder emits one TextInputCompletedEvent carrying the final value, keystroke count, and typing duration, instead of 20 keydown events.

High-level detectors run in parallel

A separate watcher tracks the foreground window and emits ApplicationSwitchEvent with the attribution method. A browser-context watcher emits BrowserTabNavigationEvent when the active tab URL or title changes.

Events stream over a broadcast channel

All events, low-level and high-level, are multiplexed onto a tokio broadcast::Sender. A consumer (CLI, test runner, or live dashboard) subscribes and writes JSON, or feeds events into a replay engine or an LLM.

Side by side with Selenium IDE

Ten rows. Each row is a specific capability, not a feature bullet.

Feature	Selenium IDE	Terminator recorder
Records clicks inside a browser	Yes	Yes
Records clicks in a native app	No, the extension is scoped to the page	Yes, via Windows UIA
Records typing as one semantic event	No, one event per keystroke	Yes, TextInputCompletedEvent with duration and count
Knows typed vs pasted vs autofilled	No	Yes, TextInputMethod enum on every input event
Records app switches	No, the app context is fixed to the browser	Yes, ApplicationSwitchEvent with method attribution
Attributes Alt+Tab vs taskbar click vs Start menu	No	Yes, ApplicationSwitchMethod with six variants
Records browser tab open, close, switch, move, refresh	Partial, only current tab	Yes, BrowserTabNavigationEvent with 7 action types
Captures the UI element under each event	For DOM elements only	For every OS control, with role, name, id, bounds
Replayable as a selector-driven script	Yes, for DOM locators	Yes, with Terminator's role: / name: / id: / >> selectors
Performance modes for weak hardware	No	Normal / Balanced / LowEnergy (5 events/s, 500ms throttle)

Selenium IDE is excellent at what it does, inside a browser. This table is about what happens when the recording has to leave the tab.

The event family, in one glance

The WorkflowEvent enum in events.rs has eleven variants. Three are the high-level semantic events the rest of this page has been about. The others fill in the low-level detail when you need it.

TextInputCompletedEvent

Field name, field type, input method (Typed / Pasted / Autofilled), focus method (Click / Tab / ShiftTab), typing duration in ms, keystroke count, process name, and the UI element metadata.

ApplicationSwitchEvent

From and to window names, process names, process IDs, one of six switch methods, dwell time in the previous app, and a rolling switch_count during Alt+Tab cycling.

BrowserTabNavigationEvent

Tab action (Created, Switched, Closed, Moved, Duplicated, Pinned, Refreshed), navigation method, and from and to URLs. Handles seven action types and eight methods.

ClickEvent (semantic)

Element text, element role, interaction type (Click / Toggle / DropdownToggle / Submit / Cancel), whether the element was enabled, click position, and child text content walked to unlimited depth.

HotkeyEvent

Key combination string, detected action, whether the shortcut is global or app-specific, and the process executable name. Useful for distinguishing Ctrl+C in Slack from Ctrl+C in a terminal.

ClipboardEvent and DragDropEvent

Clipboard Copy / Cut / Paste / Clear with content and size. DragDrop with start and end positions, source and target elements, and success state. Both carry full element metadata.

Every variant the recorder can emit

MouseEvent

KeyboardEvent

ClickEvent

TextInputCompletedEvent

TextSelectionEvent

ClipboardEvent

HotkeyEvent

DragDropEvent

ApplicationSwitchEvent

BrowserTabNavigationEvent

BrowserClickEvent

BrowserTextInputEvent

ButtonClickEvent

PendingActionEvent

The math on one short session

A user types a four-word Slack message (twenty characters, twenty-two keydown/keyup pairs), switches to Chrome, opens a new tab, navigates to an internal dashboard, and copies a value. Count the events two ways.

A keystroke-level recorder

events: 44 keydown/keyup, 6 mouse events, 4 focus changes, 2 tab lifecycle blobs, 1 clipboard hook. No field context, no attribution.

Terminator workflow recorder

events: one TextInputCompleted, one hotkey, two ApplicationSwitch, one BrowserTabNavigation (Created), one BrowserTabNavigation (AddressBar), one ClickEvent, one ClipboardEvent, one final ApplicationSwitch.

Both numbers describe the same user session. One is legible to a replay engine or a language model; the other is a log you have to reconstruct before anything downstream can use it.

What a recorded session hands you

A recording you can replay as a selector-driven script, not a pixel trail
Typing steps that survive when the generated text changes length
App switches that replay via the same mechanism a human used
Tab navigation captured without a chromedriver session attached
Clipboard content preserved with size and format
Hotkey events that know whether the shortcut was global or app-scoped
Text selections with the method that produced them (drag, double-click, Ctrl+A)
Every event stamped with role, name, id, and bounds of the target element

Where this fits next to Selenium

The honest answer is: use Selenium when the target is a rendered DOM and nothing else. WebDriver is mature, the tooling is vast, and if the whole test lives in a single app.example.com tab, nothing here beats it. The existing ecosystem of page-object models, Grid distribution, and third-party reporters is a lot to walk away from.

Reach for Terminator when the recording has to leave the tab. A realistic business flow rarely stays inside one. You open a desktop Slack to find a link, Cmd+click into Chrome, paste a value, wait for a toast, switch back to Excel, paste into a cell, save with a native file dialog. A DOM recorder sees step two and step five. The workflow recorder sees every step, and each one is a first-class typed event.

The selectors the recorder emits (role, name, id, bounds) feed directly into the Terminator selector engine on replay. The same Playwright-shaped locator().click() API works against them on the OS accessibility tree, not the DOM. A recording taken on a Windows 11 machine replays on a Windows 11 VM against the same app, the same way a Selenium test replays on a headless browser.

Record a real session against your workflow

Fifteen minutes to run the recorder against your actual flow, see the event stream, and decide whether it replaces the glue code you have today.

Questions people ask about this

Frequently asked questions

Why does Selenium IDE only record events inside a browser?

Selenium IDE is a browser extension. It listens to DOM events on the rendered page inside Chrome, Firefox, or Edge. Everything it records is a click on a DOM node or a keystroke sent to an input element. The moment a native dialog opens, a menu bar is clicked, or the user switches to another app, the recording stops. That is not a bug, it is the architectural limit of an extension. Terminator's recorder lives outside the browser, on top of Windows UI Automation, so it records every window on the desktop with the same semantics.

What is a semantic event in this context?

A single event that carries the meaning of a user action, not the raw input that produced it. When a user types 'hello world' into a field, the low-level trace is 11 keydown events and 11 keyup events. The semantic event Terminator emits, TextInputCompletedEvent in crates/terminator-workflow-recorder/src/events.rs line 977, carries the final text value, the field's name and type, whether the text was typed or pasted or autofilled, how long the typing took, and how many keystrokes contributed. The 22 low-level entries collapse into one event that a replay engine can actually use.

Which events does the recorder emit by default?

The default config at recorder.rs lines 197 to 229 turns on record_mouse, record_keyboard, record_clipboard, record_hotkeys, record_text_input_completion, record_application_switches, and record_browser_tab_navigation. The three high-level types are TextInputCompletedEvent (typing finished), ApplicationSwitchEvent (the user moved between apps), and BrowserTabNavigationEvent (a tab was created, switched, closed, moved, or refreshed). Mouse clicks and hotkeys are also emitted with UI element context attached, so every event knows the role, name, and bounds of whatever it targeted.

How is the typing event different from a send_keys call?

send_keys in Selenium is an action you write in a test. TextInputCompletedEvent is a record of what a real user did. It is emitted by the recorder after a debounce fires on typing activity, so rapid keystrokes into one field collapse into one event rather than one per letter. The input_method enum distinguishes typed text from pasted text from autofilled text, which matters because autofilled passwords and pasted URLs are common in real sessions and most recorders lose that signal. The focus_method enum tells you how the field was reached (click, Tab key, Shift+Tab) so a replay can reach the same field the same way.

Can the recorder attribute an app switch to Alt+Tab versus a taskbar click?

Yes. ApplicationSwitchEvent in events.rs line 1020 has a switch_method field with six variants: AltTab, TaskbarClick, WindowsKeyShortcut, StartMenu, WindowClick, and Other. It also carries from_window_and_application_name, to_window_and_application_name, the process IDs on both sides, the dwell time in the previous app, and a switch_count that increments during rapid Alt+Tab cycling. That is enough information to replay the switch by the same mechanism a human used, not just by raising the target window.

Does the workflow recorder capture the UI element under a click?

Yes, by default. capture_ui_elements is true in the default config, so every mouse event, keyboard event, and text input event gets an EventMetadata payload that points at the UI element it targeted. Each element carries the same role, name, id, and bounds fields the Terminator selector engine can match against later. That means a recording is replayable as a selector-driven script, not a fragile sequence of pixel coordinates.

What performance modes does the recorder support?

Three. PerformanceMode::Normal captures every event in full detail. PerformanceMode::Balanced filters mouse noise, throttles mouse moves to 200ms, and caps events at 20 per second. PerformanceMode::LowEnergy throttles mouse moves to 500ms, caps events at 5 per second, disables text-input completion, and filters keyboard noise for weak machines. See recorder.rs lines 31 to 76 for the exact config each mode applies. You can override any field independently of the mode.

Is this open source?

Yes. Terminator is MIT-licensed and the full workflow recorder lives in crates/terminator-workflow-recorder/ in the mediar-ai/terminator repo on GitHub. The event definitions are in events.rs (1830 lines), the platform-agnostic config and session loop are in recorder.rs (575 lines), and the Windows UIA-specific detection logic lives under recorder/windows/.

Does it work on macOS and Linux?

The core Terminator framework targets Windows UIA and macOS AX, and a subset of functionality is available on Linux through AT-SPI2. The workflow recorder crate currently ships a Windows backend (recorder/windows/) because UIA's event subscription model is the most mature. macOS AX event subscription is on the roadmap. If you need a cross-platform recorder today, run it on a Windows VM or Windows host.

How does this compare to running Selenium IDE against a web app?

Selenium IDE is excellent at what it does, inside a browser, with a web app that follows standard DOM patterns. It is the wrong tool when the recording has to cross windows, interact with a Save dialog, wait on a desktop tray icon, or hop between Slack and a web form. Terminator's recorder and Selenium IDE answer different questions. If your test is entirely inside a rendered DOM, use Selenium IDE. If it is not, the workflow recorder exists precisely for the parts Selenium cannot see.

Every other guide on this topic stops at click() and send_keys()

The anchor fact other guides miss

One event per typing session, not one per keystroke

An app switch knows whether the user pressed Alt+Tab

A thirty-seven event session, recorded live

How a raw user action becomes a typed event

How the pipeline actually works

Low-level input arrives

UI element context is attached

Semantic debouncing kicks in

High-level detectors run in parallel

Events stream over a broadcast channel

Side by side with Selenium IDE

The event family, in one glance

TextInputCompletedEvent

ApplicationSwitchEvent

BrowserTabNavigationEvent

ClickEvent (semantic)

HotkeyEvent

ClipboardEvent and DragDropEvent

Every variant the recorder can emit

The math on one short session

Where this fits next to Selenium

Record a real session against your workflow

Questions people ask about this

Frequently asked questions

Comments (••)

Comments ()