UI automation shaped like Selenium, recorded as semantic events instead of keystrokes

Selenium IDE listens to DOM events inside one browser tab. Terminator's workflow recorder sits on top of Windows UI Automation and emits typed, high-level events across every window on the OS. One "typed hello world in 1.2 seconds over 11 keystrokes" event replaces twenty-two keydown / keyup entries. Source: crates/terminator-workflow-recorder/src/events.rs.

M
Matthew Diakonov
11 min read
4.9from dozens of design partners
record_text_input_completion: true by default in recorder.rs line 203
TextInputCompletedEvent carries 8 typed fields in events.rs line 977
ApplicationSwitchMethod has 6 variants including AltTab and TaskbarClick
BrowserTabNavigationEvent covers 7 tab actions, 8 navigation methods

Every other guide on this topic stops at click() and send_keys()

If you have read three articles about Selenium this week, you know the shape. A driver, a By.id locator, a click, a send_keys. Maybe a Selenium Grid diagram if the writer was feeling generous. Maybe a paragraph about Selenium IDE, the Chrome extension that records clicks and plays them back.

The guides stop there because the API does. WebDriver speaks a protocol that browser engines implement. Inside a rendered DOM, it is precise and boring, the way testing infrastructure should be. Outside a rendered DOM, it is silent. A file upload dialog, a native menu bar, a taskbar, a Slack desktop app, an Excel cell: invisible.

This page is about the recorder most of those guides never cover: Terminator's terminator-workflow-recorder crate, which records UI events the same way Selenium IDE does, but one level deeper, on top of the OS accessibility API instead of the DOM. The events it emits are not keystrokes. They are typed, semantic, and carry the context an LLM or a replay engine needs to do something useful with them.

The anchor fact other guides miss

The default recorder config turns on three semantic event streams simultaneously. It is a one-line claim you can check against the source.

crates/terminator-workflow-recorder/src/recorder.rs

Three flags do the heavy lifting. record_text_input_completion debounces typing into one event per field per session. record_application_switches emits one event every time the foreground window changes, tagged with the method the user used. record_browser_tab_navigation watches every Chromium, Firefox, and Edge process for tab lifecycle changes and emits them as first-class events with URLs.

0high-level event streams on by default
0WorkflowEvent variants in events.rs
0lines in the event definition file
0ApplicationSwitchMethod variants
1 event / session

Emitted after typing activity debounces, not per keystroke. Captures typing, pasting, and autofill as three different input methods.

Tool contract on TextInputCompletedEvent in crates/terminator-workflow-recorder/src/events.rs at line 977

One event per typing session, not one per keystroke

Here is the full struct. Eight typed fields, each with a specific job. Read the comments and you can see the questions a replay engine can now answer without walking a raw event log.

crates/terminator-workflow-recorder/src/events.rs

Why this matters

A replay step that says "type hello world into Send" survives when an LLM later generates "hello Alice" instead. A replay step that says "keydown H, keydown E, keydown L..." breaks.

Why input_method matters

Passwords are almost always autofilled. URLs are almost always pasted. A recorder that conflates the three cannot tell a password manager event from a user typing, and cannot replay the correct one.

An app switch knows whether the user pressed Alt+Tab

Every time the foreground window changes, the recorder classifies how. That classification is what makes a replay faithful. Lifting a window with SetForegroundWindow looks fine in a screenshot; it misses every real path a user took.

crates/terminator-workflow-recorder/src/events.rs

A thirty-seven event session, recorded live

Here is what a short session looks like when the recorder streams its output. A user switched to Slack via Alt+Tab, clicked the new-message button, typed a short message, submitted with Ctrl+Enter, switched to Chrome via the taskbar, and navigated to a dashboard URL. Eight human actions, nine events.

terminator record

How a raw user action becomes a typed event

Mouse hooks
Keyboard hooks
Foreground window watcher
Clipboard listener
WorkflowRecorder
TextInputCompletedEvent
ApplicationSwitchEvent
BrowserTabNavigationEvent
ClickEvent / HotkeyEvent

How the pipeline actually works

Five passes happen on every event before it leaves the recorder. Keeping them separate is what makes the final event stream worth reading.

1

Low-level input arrives

The recorder subscribes to Windows UIA's global event streams and the raw mouse and keyboard hooks. Every click, keydown, and focus change is received.

2

UI element context is attached

For every event, the recorder walks the UIA tree once to capture the target element's role, name, id, bounds, and process. That data is stapled to the event as EventMetadata.

3

Semantic debouncing kicks in

A typing session on a TextBox gets buffered. When the activity debounce fires, the recorder emits one TextInputCompletedEvent carrying the final value, keystroke count, and typing duration, instead of 20 keydown events.

4

High-level detectors run in parallel

A separate watcher tracks the foreground window and emits ApplicationSwitchEvent with the attribution method. A browser-context watcher emits BrowserTabNavigationEvent when the active tab URL or title changes.

5

Events stream over a broadcast channel

All events, low-level and high-level, are multiplexed onto a tokio broadcast::Sender. A consumer (CLI, test runner, or live dashboard) subscribes and writes JSON, or feeds events into a replay engine or an LLM.

Side by side with Selenium IDE

Ten rows. Each row is a specific capability, not a feature bullet.

FeatureSelenium IDETerminator recorder
Records clicks inside a browserYesYes
Records clicks in a native appNo, the extension is scoped to the pageYes, via Windows UIA
Records typing as one semantic eventNo, one event per keystrokeYes, TextInputCompletedEvent with duration and count
Knows typed vs pasted vs autofilledNoYes, TextInputMethod enum on every input event
Records app switchesNo, the app context is fixed to the browserYes, ApplicationSwitchEvent with method attribution
Attributes Alt+Tab vs taskbar click vs Start menuNoYes, ApplicationSwitchMethod with six variants
Records browser tab open, close, switch, move, refreshPartial, only current tabYes, BrowserTabNavigationEvent with 7 action types
Captures the UI element under each eventFor DOM elements onlyFor every OS control, with role, name, id, bounds
Replayable as a selector-driven scriptYes, for DOM locatorsYes, with Terminator's role: / name: / id: / >> selectors
Performance modes for weak hardwareNoNormal / Balanced / LowEnergy (5 events/s, 500ms throttle)

Selenium IDE is excellent at what it does, inside a browser. This table is about what happens when the recording has to leave the tab.

The event family, in one glance

The WorkflowEvent enum in events.rs has eleven variants. Three are the high-level semantic events the rest of this page has been about. The others fill in the low-level detail when you need it.

TextInputCompletedEvent

Field name, field type, input method (Typed / Pasted / Autofilled), focus method (Click / Tab / ShiftTab), typing duration in ms, keystroke count, process name, and the UI element metadata.

ApplicationSwitchEvent

From and to window names, process names, process IDs, one of six switch methods, dwell time in the previous app, and a rolling switch_count during Alt+Tab cycling.

BrowserTabNavigationEvent

Tab action (Created, Switched, Closed, Moved, Duplicated, Pinned, Refreshed), navigation method, and from and to URLs. Handles seven action types and eight methods.

ClickEvent (semantic)

Element text, element role, interaction type (Click / Toggle / DropdownToggle / Submit / Cancel), whether the element was enabled, click position, and child text content walked to unlimited depth.

HotkeyEvent

Key combination string, detected action, whether the shortcut is global or app-specific, and the process executable name. Useful for distinguishing Ctrl+C in Slack from Ctrl+C in a terminal.

ClipboardEvent and DragDropEvent

Clipboard Copy / Cut / Paste / Clear with content and size. DragDrop with start and end positions, source and target elements, and success state. Both carry full element metadata.

Every variant the recorder can emit

MouseEvent
KeyboardEvent
ClickEvent
TextInputCompletedEvent
TextSelectionEvent
ClipboardEvent
HotkeyEvent
DragDropEvent
ApplicationSwitchEvent
BrowserTabNavigationEvent
BrowserClickEvent
BrowserTextInputEvent
ButtonClickEvent
PendingActionEvent

The math on one short session

A user types a four-word Slack message (twenty characters, twenty-two keydown/keyup pairs), switches to Chrome, opens a new tab, navigates to an internal dashboard, and copies a value. Count the events two ways.

A keystroke-level recorder

0

events: 44 keydown/keyup, 6 mouse events, 4 focus changes, 2 tab lifecycle blobs, 1 clipboard hook. No field context, no attribution.

Terminator workflow recorder

0

events: one TextInputCompleted, one hotkey, two ApplicationSwitch, one BrowserTabNavigation (Created), one BrowserTabNavigation (AddressBar), one ClickEvent, one ClipboardEvent, one final ApplicationSwitch.

Both numbers describe the same user session. One is legible to a replay engine or a language model; the other is a log you have to reconstruct before anything downstream can use it.

What a recorded session hands you

  • A recording you can replay as a selector-driven script, not a pixel trail
  • Typing steps that survive when the generated text changes length
  • App switches that replay via the same mechanism a human used
  • Tab navigation captured without a chromedriver session attached
  • Clipboard content preserved with size and format
  • Hotkey events that know whether the shortcut was global or app-scoped
  • Text selections with the method that produced them (drag, double-click, Ctrl+A)
  • Every event stamped with role, name, id, and bounds of the target element

Where this fits next to Selenium

The honest answer is: use Selenium when the target is a rendered DOM and nothing else. WebDriver is mature, the tooling is vast, and if the whole test lives in a single app.example.com tab, nothing here beats it. The existing ecosystem of page-object models, Grid distribution, and third-party reporters is a lot to walk away from.

Reach for Terminator when the recording has to leave the tab. A realistic business flow rarely stays inside one. You open a desktop Slack to find a link, Cmd+click into Chrome, paste a value, wait for a toast, switch back to Excel, paste into a cell, save with a native file dialog. A DOM recorder sees step two and step five. The workflow recorder sees every step, and each one is a first-class typed event.

The selectors the recorder emits (role, name, id, bounds) feed directly into the Terminator selector engine on replay. The same Playwright-shaped locator().click() API works against them on the OS accessibility tree, not the DOM. A recording taken on a Windows 11 machine replays on a Windows 11 VM against the same app, the same way a Selenium test replays on a headless browser.

Record a real session against your workflow

Fifteen minutes to run the recorder against your actual flow, see the event stream, and decide whether it replaces the glue code you have today.

Questions people ask about this

Frequently asked questions

Why does Selenium IDE only record events inside a browser?

Selenium IDE is a browser extension. It listens to DOM events on the rendered page inside Chrome, Firefox, or Edge. Everything it records is a click on a DOM node or a keystroke sent to an input element. The moment a native dialog opens, a menu bar is clicked, or the user switches to another app, the recording stops. That is not a bug, it is the architectural limit of an extension. Terminator's recorder lives outside the browser, on top of Windows UI Automation, so it records every window on the desktop with the same semantics.

What is a semantic event in this context?

A single event that carries the meaning of a user action, not the raw input that produced it. When a user types 'hello world' into a field, the low-level trace is 11 keydown events and 11 keyup events. The semantic event Terminator emits, TextInputCompletedEvent in crates/terminator-workflow-recorder/src/events.rs line 977, carries the final text value, the field's name and type, whether the text was typed or pasted or autofilled, how long the typing took, and how many keystrokes contributed. The 22 low-level entries collapse into one event that a replay engine can actually use.

Which events does the recorder emit by default?

The default config at recorder.rs lines 197 to 229 turns on record_mouse, record_keyboard, record_clipboard, record_hotkeys, record_text_input_completion, record_application_switches, and record_browser_tab_navigation. The three high-level types are TextInputCompletedEvent (typing finished), ApplicationSwitchEvent (the user moved between apps), and BrowserTabNavigationEvent (a tab was created, switched, closed, moved, or refreshed). Mouse clicks and hotkeys are also emitted with UI element context attached, so every event knows the role, name, and bounds of whatever it targeted.

How is the typing event different from a send_keys call?

send_keys in Selenium is an action you write in a test. TextInputCompletedEvent is a record of what a real user did. It is emitted by the recorder after a debounce fires on typing activity, so rapid keystrokes into one field collapse into one event rather than one per letter. The input_method enum distinguishes typed text from pasted text from autofilled text, which matters because autofilled passwords and pasted URLs are common in real sessions and most recorders lose that signal. The focus_method enum tells you how the field was reached (click, Tab key, Shift+Tab) so a replay can reach the same field the same way.

Can the recorder attribute an app switch to Alt+Tab versus a taskbar click?

Yes. ApplicationSwitchEvent in events.rs line 1020 has a switch_method field with six variants: AltTab, TaskbarClick, WindowsKeyShortcut, StartMenu, WindowClick, and Other. It also carries from_window_and_application_name, to_window_and_application_name, the process IDs on both sides, the dwell time in the previous app, and a switch_count that increments during rapid Alt+Tab cycling. That is enough information to replay the switch by the same mechanism a human used, not just by raising the target window.

Does the workflow recorder capture the UI element under a click?

Yes, by default. capture_ui_elements is true in the default config, so every mouse event, keyboard event, and text input event gets an EventMetadata payload that points at the UI element it targeted. Each element carries the same role, name, id, and bounds fields the Terminator selector engine can match against later. That means a recording is replayable as a selector-driven script, not a fragile sequence of pixel coordinates.

What performance modes does the recorder support?

Three. PerformanceMode::Normal captures every event in full detail. PerformanceMode::Balanced filters mouse noise, throttles mouse moves to 200ms, and caps events at 20 per second. PerformanceMode::LowEnergy throttles mouse moves to 500ms, caps events at 5 per second, disables text-input completion, and filters keyboard noise for weak machines. See recorder.rs lines 31 to 76 for the exact config each mode applies. You can override any field independently of the mode.

Is this open source?

Yes. Terminator is MIT-licensed and the full workflow recorder lives in crates/terminator-workflow-recorder/ in the mediar-ai/terminator repo on GitHub. The event definitions are in events.rs (1830 lines), the platform-agnostic config and session loop are in recorder.rs (575 lines), and the Windows UIA-specific detection logic lives under recorder/windows/.

Does it work on macOS and Linux?

The core Terminator framework targets Windows UIA and macOS AX, and a subset of functionality is available on Linux through AT-SPI2. The workflow recorder crate currently ships a Windows backend (recorder/windows/) because UIA's event subscription model is the most mature. macOS AX event subscription is on the roadmap. If you need a cross-platform recorder today, run it on a Windows VM or Windows host.

How does this compare to running Selenium IDE against a web app?

Selenium IDE is excellent at what it does, inside a browser, with a web app that follows standard DOM patterns. It is the wrong tool when the recording has to cross windows, interact with a Save dialog, wait on a desktop tray icon, or hop between Slack and a web form. Terminator's recorder and Selenium IDE answer different questions. If your test is entirely inside a rendered DOM, use Selenium IDE. If it is not, the workflow recorder exists precisely for the parts Selenium cannot see.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.