Windows task automation that survives the modal it just opened
Most Windows task automation listicles rank schedulers and macro recorders by triggers and scripts. This one goes a layer down: the single static mutex that lets Terminator record a human's click without the click corrupting the UIA tree the recorder was just reading from. One file, one timestamp, one 5000 ms cutoff.
The Windows task automation problem nobody writes about
Every guide to Windows task automation on the first page of Google names the same eight tools. Task Scheduler for triggers. Power Automate Desktop for GUI flowcharts. AutoHotkey and AutoIt for hand-rolled macros. RoboTask, UiPath, and Blue Prism for enterprise RPA canvases. SikuliX and TinyTask for image and coordinate replay. Every article sorts them by trigger types, pricing, and whether you can draw a flowchart without writing code.
None of them describe the engineering problem that actually determines whether your automation will run twice in a row on the same machine. The problem is not scheduling. The problem is that the moment a recorder observes a click, the click is already changing the UI that the recorder just read from. A Save button click opens a modal. The modal draws on top of the button. The button is now covered. The UIA element reference that described the button may now point at the modal, or at nothing, depending on which frame the lookup happened to return in.
That is the race. It is why macro recorders on Windows often capture a click that turns into a ghost on replay. It is the reason we put a deferred capture in the recorder before we put anything else in it.
Where the race happens
The anchor fact: one static, one timestamp, one cutoff
Here is the declaration. A static Lazy-initialised Mutex wrapping an Option. The press handler sets it. The release handler takes it. The Option semantics mean there is at most one outstanding click at any time, so the recorder cannot leak pending captures across gestures.
And here is the struct it stores. Three fields, one of them a timestamp set to Instant::now() at MouseDown. That field is the entire reason the deferred pattern is safe to use: it lets the release handler decide when to give up on a capture that has gone stale.
“A static Mutex holding at most one pending click, drained on MouseUp, discarded past 5000 ms.”
crates/terminator-workflow-recorder/src/recorder/windows/mod.rs:29
Capture on down, emit on up
The press handler runs after the UIA element has been resolved for the point under the cursor. If the lookup produced any click event at all, it is stored with a fresh timestamp. This is the last moment at which the UIA tree is guaranteed to still describe the screen the user saw.
The release handler gates on left button only. It locks the mutex, takes the capture, checks age_ms > 5000, and either discards the stale capture or emits the click. The arithmetic is elementary. The discipline is the interesting part.
Seven steps, one click
This is what happens inside Terminator between the user pressing and releasing the mouse button on a Windows application. Step six is the one every other macro recorder skips.
User presses the left mouse button
The low-level mouse hook fires handle_button_press_request on the input thread. Position is captured from the raw WM_LBUTTONDOWN payload.
At this instant the target application has not yet received WM_LBUTTONDOWN. The UI is still sitting where the user saw it. The UIA tree under the cursor is stable.
UIA traversal resolves the element under the cursor
get_deepest_element_from_point_with_timeout walks the UIA tree down to the deepest element containing the point. Timeout is configurable; if it fires, the click is recorded without a rich element.
Capture is written to PENDING_CLICK_CAPTURE
The mutex is locked, the Option is replaced with a new PendingClickCapture containing the Click event, the optional BrowserClickEvent, and Instant::now(). Nothing is emitted yet.
The application processes the click
Windows routes the click to the target window. A modal may open. Focus may shift. A toast may appear. The UIA tree begins to mutate. None of that affects the already-captured element reference.
User releases the left mouse button
handle_button_release_request fires on the same input thread. It locks the mutex and takes the capture, leaving None behind.
Staleness check on line 2863
capture.timestamp.elapsed().as_millis() is compared against 5000. If larger, the capture is dropped with a 'Discarding stale pending click capture' debug log. This is the drag/long-press filter.
Click and optional BrowserClick events are emitted
BrowserClickEvent is sent first so downstream listeners can enrich the click with DOM context. Then the Click event is sent on the broadcast channel. A compiler step turns the stream into a replayable YAML workflow.
The naive path versus the deferred path
Left: the one-handler approach that every Windows-task-automation listicle implicitly assumes. Right: the two-handler approach Terminator actually ships. The difference is twelve lines of code and a timestamp, and it is the difference between a recorded workflow that replays and one that does not.
// The naive "record on click" path every macro recorder takes.
// Fires one handler on WM_LBUTTONDOWN, reads the UI tree right then,
// emits the event. Looks right in isolation. Breaks the moment the
// click opens a modal.
fn on_mouse_down(pos: Position) {
let element = uia.get_element_from_point(pos); // may block on modal
let click = ClickEvent { element, pos };
event_tx.send(WorkflowEvent::Click(click));
// The modal is already drawing. If the element lookup was still
// in flight, it now returns a reference into the NEW window.
// Replay will click the modal's OK button, not the thing the
// user actually clicked.
}How the threads talk on one real click
The input thread owns the press and the release. The UIA thread owns the element lookup. PENDING_CLICK_CAPTURE is the single memory location they share. This is one click on one Save button, ending with a modal and the click event safely emitted on mouse up.
PENDING_CLICK_CAPTURE lifecycle
What a real recording session prints
With debug logs on, the recorder narrates the capture and emission for every click. Two back-to-back clicks on a Save button and its follow-up OK dialog. The timing numbers are from a real run on a developer laptop.
The subsystems that meet at the mutex
The deferred capture is the simplest part of the recorder. The interesting part is the five subsystems that have to agree on its contract: the input hook and the UIA thread write to it, the browser bridge enriches it, the release handler drains it, and the workflow compiler turns the drained events into a replayable YAML file.
Low-level mouse hook
A Windows SetWindowsHookEx(WH_MOUSE_LL) hook runs on the input listener thread and fires handle_button_press_request on down and handle_button_release_request on up. It owns the 'when' of each click with sub-millisecond precision.
UIA traversal thread
A separate COM apartment (COINIT_APARTMENTTHREADED unless you opt in to multithreaded) runs get_deepest_element_from_point_with_timeout. It owns the 'what' of each click by walking the UIA tree to the deepest element under the coordinates.
PENDING_CLICK_CAPTURE
The static mutex in mod.rs line 29. Holds at most one pending click with its timestamp. The press thread writes it, the release thread reads and drains it. 5000 ms cap on line 2863.
Browser context bridge
browser_context.rs watches focused Chrome and Edge windows and populates an optional BrowserClickEvent alongside the UIA click. When present, replay can target a DOM selector instead of just a UIA path.
Text input tracker
Parallel system for keyboard events. TextInputTracker in structs.rs batches keystrokes into a TextInputCompletedEvent at focus change. Click deferral is the mouse-side equivalent.
Why this matters for Windows task automation
Five concrete failure modes the pattern rules out
- A recorded Save button click that does not record the modal's OK button is a workflow that will hang on replay.
- A recorded click whose element reference was captured mid-modal-draw will resolve to a disposed UIA element on replay.
- A recorded drag that was accidentally emitted as a click will make the replay engine fire invoke_element on empty space.
- A recorded click emitted before the UI element lookup returns will attach to whatever happened to be under the cursor two frames later.
- Each of those has a specific remedy in the deferred capture pattern: MouseDown capture, MouseUp emission, 5000ms cap, graceful None fallback on slow UIA.
Verify it in the source
Four grep lines. Each one points at a specific line of a specific file. If you can run a grep, you can reproduce the anchor fact on this page without trusting a word of it.
The rest of the Windows task automation field
These tools cover triggering, scheduling, and coordinate replay. None of them ship a split press/release capture for their click recorder, so their recorded workflows are fragile on any UI that opens a modal in the millisecond after the click lands.
One number to remember: 0 ms
The staleness cutoff on line 2863 of mod.rs is the only number you have to keep in your head to understand why a Terminator-recorded Windows workflow replays cleanly. If a captured click survives longer than 5000 ms before the release arrives, it was a drag or a long press, not a click, and the recorder drops it on the floor. Everything else, including the selector that will be used to replay the click against a moved window or a re-themed UI, rides on that timestamp.
When this does and does not matter
If you are only using Windows task automation to trigger a .exe on a schedule, you do not need this pattern. Task Scheduler runs a program at a time, and the program decides what to do. The deferred capture matters when the automation is recording or replaying a sequence of clicks and keystrokes against a real Windows GUI, which is the failure mode every RPA customer has felt and no marketing page describes.
The healthy pairing is Task Scheduler decides when the workflow runs, and a Terminator execute_sequence call, compiled from a recorded session, decides what the workflow does once it starts. The scheduler cares about calendars. Terminator cares about the millisecond after the click.
Want to see your flakiest Windows task recorded once and replayed fifty times in a row?
Book 20 minutes. Bring one task where a recorded macro goes ghost on the second run. We will show you the exact mouse_down, mouse_up, and staleness log lines that explain why, and what the fix looks like.
Frequently asked questions
Why does a Windows task automation recorder need a separate MouseDown and MouseUp handler?
When a human clicks, the click can fire off a modal, pop a toast, shift keyboard focus, or swap in a new window. If the recorder tries to read the UIA element at the same instant the click lands, the tree it is reading from is already mutating, and the element reference is either stale, wrong, or blocked waiting on the modal to finish drawing. Terminator splits the work. The UI element is captured on MouseDown while the target is still sitting where the user saw it. The Click event is only emitted on MouseUp, after the modal has had its chance to draw. The two moments are connected by a single static mutex called PENDING_CLICK_CAPTURE in crates/terminator-workflow-recorder/src/recorder/windows/mod.rs at line 29.
What is PENDING_CLICK_CAPTURE actually storing?
It is a `static Lazy<Mutex<Option<PendingClickCapture>>>` that holds at most one outstanding click. The PendingClickCapture struct (windows/structs.rs line 560) has three fields: a regular Click event (always present for left clicks), an optional BrowserClickEvent (present when the click lands inside Chrome or Edge and has DOM info), and a timestamp set to Instant::now() at the moment of MouseDown. The timestamp is only there to let the MouseUp handler decide whether the pending click is still fresh.
Why 5000 ms and not some other cutoff?
The cutoff in mod.rs line 2863 treats anything older than 5 seconds as a long press or drag rather than a click. If you press the left button and hold it while dragging a window across the desktop for eight seconds, no click event should fire on release. 5000 ms is the threshold the recorder uses to tell 'slow human click' apart from 'drag gesture.' It is a constant, not a config flag, and grepping the file will show you exactly one line that owns it.
How is this different from Power Automate Desktop or AutoHotkey?
Power Automate Desktop records selectors by watching UIA focus events, not low-level mouse events; it does not face the MouseDown-versus-MouseUp race because it captures on the UIA-side notification after the action has settled, which is why it tends to miss rapid or nested clicks. AutoHotkey is pixel level, so it has no element to capture at all; it replays coordinates and hopes. Terminator sits between the two: low-level mouse hooks give you exactly when the user pressed and released the button, and a parallel UIA lookup gives you which element they hit. The deferred emission is the stitching.
What happens if the UIA lookup takes longer than the user's click?
The recorder runs the element lookup with a timeout budget in get_deepest_element_from_point_with_timeout (mod.rs line 2922). If the UIA tree traversal does not finish before the MouseUp fires, the Click event is emitted with a None ui_element and gets a debug log saying 'Storing Click event (no UI element).' The workflow still records the click, just without a rich selector. That is recoverable because the coordinate is still there; it is not a silent drop.
How does this plug into the execute_sequence replay engine?
The recorder emits a WorkflowEvent::Click with a full UI element attached (role, name, automation id, process name). A downstream build step compiles that stream into a YAML workflow whose steps call invoke_element or click_element with a selector derived from the captured element. When execute_sequence replays the workflow, the same element is looked up by selector rather than by coordinate, so the replay survives window moves, DPI changes, and theme swaps. The deferred capture is the reason the selector is correct in the first place.
Is this only a Windows problem or does the Mac recorder do the same thing?
The staleness pattern lives in the Windows recorder because Windows UIA and raw input hooks are two separate subsystems that can race. On macOS, the Accessibility API surfaces richer events that include the element under the cursor natively, so the deferred capture dance is not needed. The file path makes the scope explicit: `crates/terminator-workflow-recorder/src/recorder/windows/mod.rs`. No equivalent static lives in the macOS recorder.
Why does Terminator use COINIT_APARTMENTTHREADED by default?
UIAutomation COM objects are happier inside a single-threaded apartment. The recorder's threading model is picked in mod.rs line 201: if the recorder is configured with enable_multithreading the call site passes COINIT_MULTITHREADED, otherwise COINIT_APARTMENTTHREADED. Running the UIA pump and the mouse hook in separate apartments is one of the reasons the deferred capture pattern works at all; each thread owns its own COM state and talks to the shared PENDING_CLICK_CAPTURE via a plain Mutex.