Test automation for desktop applications, with a four-file forensics bundle on every step

Most guides on desktop test automation stop at “send a click, read the value back”. They leave you to build your own failure-triage story: a separate screenshot harness, a separate step log, a separate replay loader. Terminator ships all of it in one place. Every MCP tool call writes four files to disk, right next to each other, under a predictable path. The files show up whether you drive the framework from a TypeScript test runner, a Python script, Claude Desktop, or Cursor. The rest of this page is a tour of the exact file names, the exact directory, and the exact Rust source that produces them.

execution_logger.rs.json + .ts + _before.png + _after.png7-day retention%LOCALAPPDATA%/mediar/executionsMIT

Matthew Diakonov, Written with AI

Published April 24, 202610 min read

4.9from open-source practitioners on GitHub

execution_logger.rs: 2,790 lines, MIT-licensed, grep-able in a fresh clone

Four files per tool call: .json, .ts, _before.png, _after.png

RETENTION_DAYS = 7, cleanup runs on agent startup

Opt-out via TERMINATOR_DISABLE_EXECUTION_LOGS=1

Four files per step

The forensics bundle Terminator writes behind every MCP tool call

.json: request, response, duration, selectors_tried

.ts: a regenerated SDK replay snippet

_before.png: the desktop frame immediately before the action

_after.png: the desktop frame immediately after

Seven-day retention, no opt-in required

0:00 / 0:05

Why desktop test automation needs this primitive

A web test that fails in CI has Playwright traces. You open the trace viewer, you scrub to the failing step, and you see the DOM, the network, and the screenshot together. That is why investigating a web flake is a ten-minute job.

A desktop test that fails in CI, on most frameworks, gives you a stack trace and a line number. Maybe you wrote a custom listener that dumped a screenshot on failure. Maybe you did not. You reach for Remote Desktop, spin up the build agent, try to reproduce by hand, and half the time the state is already gone because the Windows session got recycled.

Terminator refuses to make that be the default. The MCP agent treats every tool call as an event worth archiving. The archive has four parts because that is what it takes to reconstruct a failing step: a structured record (what was requested and how it returned), a replayable form (the exact SDK call you would write to reproduce it), and the two frames of video that would make a human go “oh, the wrong window was focused” in under a second.

The archive pipeline

One tool call in, four artifacts out. The extractor runs at the MCP dispatch boundary, not in your test code.

MCP tool call to on-disk bundle

What lands on disk, exactly

The file prefix isYYYYMMDD_HHMMSS_workflowId_stepId_toolName. workflowId defaults to standalone for ad-hoc calls (Claude Desktop, Cursor, a REPL). stepId defaults to full. toolName is the MCP tool minus the mcp__terminator-mcp-agent__ prefix so the filename stays readable.

mediar/executions, after a three-step workflow run

4 files / step

“A desktop test failure is no longer a stack trace. It is a JSON, a snippet, and two screenshots in a folder whose path you already know. Grep the repo for execution_logger.rs. Line 19 is RETENTION_DAYS = 7. Line 79 is get_executions_dir returning dirs::data_local_dir().join('mediar').join('executions'). Line 684 is the TypeScript snippet dispatch table. Every claim on this page maps to a grep hit.”

github.com/mediar-ai/terminator, crates/terminator-mcp-agent/src/execution_logger.rs

The four artifacts, broken down

Below is what you actually find in each file, and the exact line of the Rust source that produces it. Read once, then open any real bundle on your own machine and everything lines up.

<prefix>.json

Structured execution record. Contains tool name, workflow_id, step_id, step_index, request arguments, response status, duration_ms, any error message, captured log lines, and references to the screenshots that landed beside it. Written by log_response at execution_logger.rs line 283.

status: executed_without_error

duration_ms: 842

screenshots.before: ..._before.png

screenshots.after: [..._after.png]

<prefix>.ts

Regenerated TypeScript SDK snippet that reproduces the exact tool call. Every supported tool has its own snippet generator: generate_click_snippet for click_element, generate_type_snippet for type_into_element, generate_validate_snippet for validate_element, and so on. Dispatch table at execution_logger.rs line 684.

<prefix>_before.png

Raw desktop screenshot captured before the action fired. Extracted from the before_screenshot or screenshot_before field on the MCP result, base64-decoded, written as PNG. Field list at lines 476 and 477.

<prefix>_after.png

The after frame. If the tool returned a single screenshot field, it is saved as _after.png. If the MCP content array carried multiple image items, each becomes _after_1.png, _after_2.png, in the order they appeared. Lines 508 to 540.

Workflow-scoped executions

When the call carried a workflow_id, the bundle lands in %LOCALAPPDATA%/mediar/workflows/<workflow_id>/executions/ instead. One folder per test run, easy to tar-and-attach to a CI failure artifact. get_workflow_executions_dir at line 88.

The source that produces it

The logger is 2,790 lines of Rust. The parts that matter for a reader evaluating whether this is real, not marketing, are the path resolver, the file-prefix generator, and the response handler that fans the four files out. All three are below.

crates/terminator-mcp-agent/src/execution_logger.rs

How screenshots get captured with zero test-side wiring

The hard part of this primitive is not writing files. It is deciding what counts as a screenshot inside an MCP result when the result format depends on which tool returned it. The extractor probes six specific field names, then walks the MCP content array looking for image items, and also parses nested JSON strings inside text items in case a tool wrapped its screenshot there. PNG or JPEG magic bytes are checked before any file is written, so accidental matches do not land on disk.

crates/terminator-mcp-agent/src/execution_logger.rs

The tools that emit bundles

The logger runs before every tool dispatch, so anything in the MCP tool set is covered. The TypeScript snippet generator has a dedicated formatter for each of these, so the .ts file is always readable, not a JSON blob dressed up as code.

click_elementtype_into_elementpress_keypress_key_globalvalidate_elementwait_for_elementnavigate_browserget_window_treecapture_screenmouse_dragscrollset_selectedset_toggledselect_optioninvoke_elementrecord_workflowexecute_sequencerun_javascriptrun_command

How it compares to the obvious alternatives

Windows App Driver, AutoIt, pywinauto, Ranorex, TestComplete, and UFT all give you the primitives to build something like this. None of them ship it wired up. The difference between “you could add a reporter that captures screenshots” and “the framework writes a JSON and two PNGs per step by default” is the difference between having test forensics and not having them.

Feature	Typical desktop automation framework	Terminator
Per-step JSON log of request, response, duration, status	Writable in your own reporter, or lost to stdout if you do not wire one	Written automatically by log_response at execution_logger.rs:283
Before-screenshot and after-screenshot per action	Call TakeScreenshot yourself before and after every step	extract_and_save_screenshots probes six field names plus the MCP content array, at lines 464 to 541
Replayable TypeScript snippet per step	Not available. You rerun by hand from the test source	generate_typescript_snippet dispatches to 19 per-tool generators at line 684 onward
Automatic retention and cleanup	Manual disk management, or it fills up	RETENTION_DAYS = 7, swept by cleanup_old_executions at line 2383, run at startup
Opt-out for sensitive environments	Reporter toggles that do not affect framework internals	Single env var: TERMINATOR_DISABLE_EXECUTION_LOGS=1, checked at line 108
Works with non-developer drivers (Claude Desktop, Cursor, ChatGPT)	Requires a TestRunner class and a scripted harness	Runs at the MCP dispatch layer, so any MCP client gets the artifact bundle for free

Retention, not forever

Seven days, then swept

RETENTION_DAYS is a named constant at execution_logger.rs:19. cleanup_old_executions at line 2383 runs at startup in a tokio task, walks the standalone directory and every workflow directory, and deletes any bundle whose prefix date is older than today minus seven. Long retention is on you: copy to CI artifact storage at the end of the run.

days retained

files per step

per-tool snippet generators

Using the bundle in a failure post-mortem

The whole point of the four-file pattern is that it matches the order you already investigate in. Screenshot first, because it is the fastest signal. Then the structured log, because it tells you why the tool thought what it thought. Then the replay snippet, because by that point you know enough to iterate on the fix.

Failure fires in CI

A nightly desktop regression run flags step 7 as failed. Your test log says the click on 'Save' timed out after 3 seconds. You do not know whether the UI never rendered, whether the wrong button was focused, or whether a modal intercepted the click.

Pull the four files for the failed step

Grab the bundle at %LOCALAPPDATA%/mediar/workflows/<run_id>/executions/ (or the standalone dir for unscoped calls). Find the four files whose prefix ends with _click_element and was written at the failure timestamp.

Open _before.png first

This is the desktop frame captured immediately before the click fired. If the Save button is present and enabled, you already know the find succeeded. If the screen shows an unexpected modal, you have your answer without reading a single log line.

Open the .json log

Read selector_used, duration_ms, and the error block. For a find-timeout, selectors_tried lists every selector the race tried, in the order the race actually tried them. For a click that ran but missed, the error is the downstream UIA HRESULT with is_retryable set.

Open _after.png

If the after frame matches the before frame, the click did not change state, which usually means it was intercepted or the button was visible but disabled. If the after frame shows a new screen, the click landed. This is the cheap visual equivalent of a diff step in your assertion stack.

Replay by editing the .ts

The .ts file next to the PNGs is the exact SDK call that fired. Copy it into a scratch script, add a breakpoint or a retry=0 tweak, and run it against the same app. No need to reconstruct the scenario; the snippet generator already wrote it for you.

Want to see a failing test rebuild itself from the bundle?

Book a 20-minute walkthrough. We will run a real desktop test suite, break it on purpose, and reconstruct the failing step entirely from the on-disk artifacts.

Frequently asked questions

Where exactly do the four artifacts land per tool call?

On Windows, under %LOCALAPPDATA%/mediar/executions/ for standalone calls, or %LOCALAPPDATA%/mediar/workflows/<workflow_id>/executions/ when the call carries a workflow_id. On macOS, it is the dirs::data_local_dir() equivalent, which is ~/Library/Application Support/mediar/executions/. On Linux, it is $XDG_DATA_HOME or ~/.local/share/mediar/executions/. The path resolution is get_executions_dir at execution_logger.rs line 79 and get_workflow_executions_dir at line 88. Each call produces up to four files sharing a prefix of YYYYMMDD_HHMMSS_workflowId_stepId_toolName, followed by .json, .ts, _before.png, and _after.png.

What is inside the JSON file?

An ExecutionLog record with timestamp (RFC 3339), workflow_id, step_id, step_index, tool_name, the full request arguments, and a response block containing status (executed_without_error or executed_with_error), duration_ms, and the result payload. Screenshot base64 is stripped from the result before serialization by strip_screenshot_base64 at line 601, so the JSON stays small. Captured log lines from the tool's own tracing output are attached as a CapturedLogEntry array when present. Struct definitions live at execution_logger.rs lines 29 to 65.

How does the screenshot extraction work without any test-side opt-in?

extract_and_save_screenshots at line 449 probes the MCP tool result for base64 PNGs in six specific fields (screenshot, image, screenshot_base64, screenshot_before, before_screenshot, screenshot_after, after_screenshot). If the result is an MCP content array instead, it walks items looking for the canonical { type: image, data: base64 } shape, as well as nested JSON strings inside text items. The minimum length check (80 chars) at line 555 and the magic-byte check for iVBOR (PNG) or /9j/ (JPEG) filter out accidental matches. You do not need to annotate your test code; tools that already returned screenshots for AI consumption get recorded automatically.

Can I turn it off for sensitive environments?

Yes. Set TERMINATOR_DISABLE_EXECUTION_LOGS=1 (or =true) before starting the MCP agent. The check is at execution_logger.rs line 108, inside init(). When disabled, log_request returns None and no directory is created. log_response and the logs-capturing variant both short-circuit on is_enabled() at line 223 and line 249, so there is no filesystem side effect. You can also route artifacts to a different drive by running the agent under a user whose dirs::data_local_dir() resolves elsewhere.

How big does the executions folder get on a real test run?

A typical click + screenshot step writes a roughly 2 KB JSON record, a 300-byte TypeScript snippet, and two PNGs that depend on your monitor resolution (a 1440p screen at moderate compression tends to land around 400 to 800 KB each). Call that 1 to 2 MB per step. A 200-step workflow is 200 to 400 MB. Multiplied across a few days of test runs you can reach a few gigabytes, which is why RETENTION_DAYS is 7 and cleanup_old_executions at line 2383 walks both the standalone dir and every workflow dir, deleting files whose parsed prefix date is older than today minus seven. If you need longer retention, copy the folder off to your CI artifact storage at the end of each run.

How is this different from standard tracing or log files?

Tracing tells you that click_element ran for 842ms and returned Ok. That is useful for observability, but useless for reproducing a flake. A replay snippet (the .ts file) and two PNGs tell you what the UI looked like before and after the action in the exact shape the SDK would reproduce. The JSON adds the selectors_tried list and the underlying error code when things fail. Tracing, screenshots, and replay snippets together are what makes a step actually debuggable; this ships all three in the same bundle. The TypeScript snippet generator alone is 1,400 lines from line 684 down, with per-tool formatters for 19 different tool names.

Does this work when I drive Terminator from Claude Desktop or Cursor instead of a test runner?

Yes. The capture lives at the MCP dispatch boundary, not in the SDK. log_request is called at every tool invocation before dispatch, regardless of which MCP client made it. That means a one-off click fired from Claude Desktop gets the same JSON + TS + before.png + after.png bundle on disk as a click inside a scripted workflow. It turns the MCP agent into a passive test runner: run your app by hand, let an AI driver do the work, then inspect the forensics directory afterward to turn the session into a repeatable test script.

Where can I verify every claim on this page in the source?

git clone https://github.com/mediar-ai/terminator, then open crates/terminator-mcp-agent/src/execution_logger.rs. RETENTION_DAYS is at line 19. get_executions_dir is at line 79. generate_file_prefix is at line 193. log_request is at line 211. log_response is at line 242. extract_and_save_screenshots is at line 447. The TypeScript snippet dispatch table is at line 684. cleanup_old_executions is at line 2383. Every line number on this page is grep-able.

Other Terminator primitives for desktop test automation