Test automation tools for desktop applications, gated by tsc before any pixel moves
Most desktop test tools shift right. The error surfaces when the suite is already mid-run against a real application: a launch, a login, a navigation, then a stale selector throws. Terminator ships a different primitive on its MCP tool surface, typecheck_workflow. It runs tsc --noEmit on the workflow folder, parses every error with a five-field regex, attaches a seven-line code context with an arrow on the error line, and hands the result back to the AI coding assistant as JSON before any window opens.
“The whole gate, from the regex that parses tsc output to the formatter that draws an arrow on the error line, fits in a single Rust file you can read in an evening.”
crates/terminator-mcp-agent/src/tools/typecheck.rs
Five fields, one regex
The whole "is this a real type error?" decision sits in a single regex. Five capture groups, anchored to the start and end of the line, with an explicit error keyword between the location and the TS code. Tsc banners, file-list previews, and progress notes never match. Anything that does becomes one populated TypeError with five fields the assistant can act on.
Three lines, an arrow, three lines
The model needs scope. A TS2345 message in isolation tells you which type was mismatched but not which function it was inside or which variable was just declared. The context formatter renders three lines before the error, the error line itself with a -> marker, and three lines after, all with a four-character gutter so the JSON output is column-aligned regardless of how many digits the line numbers have. Total: seven lines of context per error.
Bun first, npx second, no silent shell fallback
The runner picks bun if it is on PATH, npx if not. Both are tested with the same command_exists helper at line 125 (which itself shells to where on Windows and which elsewhere). When neither is present the function returns an Err with a string the assistant can read, not a panic and not a silent zero-result. That distinction is the difference between "your workflow has type errors" and "the toolchain is missing from this machine", and the agent loop relies on it.
What the assistant gets back
One JSON object, TypecheckResult. Three top-level fields: success, error_count, errors, plus an optional raw_output that is only populated on failure. Every entry in errors is a structured TypeError with the parsed fields and the seven-line context. The model reads JSON, not a wall of compiler text.
All 35 tools live behind one MCP socket
The agent exposes 35 tools to whichever MCP-aware editor the team uses (Cursor, Claude Code, Windsurf, Zed). Most of them act on the UI: click, type, scroll, screenshot. A few act on the workspace: read_file, write_file, edit_file, grep_files, glob_files. One of them, the one this guide is built around, runs the typecheck. Below is a sample of the surface, with typecheck_workflow highlighted as the gate.
typecheck_workflow
Runs tsc --noEmit on the workflow directory, parses the output with a five-field regex, enriches every error with seven lines of context. Returns a TypecheckResult { success, errors, error_count, raw_output } as JSON.
execute_sequence
The runner. Spawns the TypeScript workflow under bun or node, attaches the event pipe, surfaces step events back as MCP notifications. The piece typecheck_workflow gates.
click
Unified click tool with three modes (selector, coords, image). Verifies the action with ui_diff_before_after so the assistant does not need a follow-up tree call.
type_text
Smart-clipboard text entry into a UI element with verification. Trailing keys like {Enter} or {Tab} are auto-detected so the assistant chains type plus submit in one tool call.
wait_for_element
Wait for an element to satisfy a condition (visible, enabled, focused, exists). One of two ways the suite stays sync-correct without a manual sleep.
validate_element
Read-only existence check that never throws. Returns status='success' with exists=true or status='failed' with exists=false. The conditional-branch primitive.
capture_screenshot
Element, window, or full-monitor screenshot. Auto-resizes to a max dimension. Pairs with execution_logger.rs to cut a before/after pair on every tool call.
read_file / write_file / edit_file / grep_files / glob_files
Five workspace tools so the assistant can repair the test workflow itself. Combined with typecheck_workflow this is the full inner loop: read, edit, typecheck, run.
Shift-right is the default; shift-left is the option
Eight concrete points where the gate-as-MCP-tool model and the traditional desktop test tool model diverge. None of these are philosophical: each one is a code path the assistant either has or does not.
| Feature | Typical desktop test tool | Terminator (typecheck_workflow) |
|---|---|---|
| Where the test learns it has a bug | At runtime, against a real window. The locator throws or a button does not appear. Sometimes thirty seconds in, after a launch, a login, and a navigation. | Before any window opens. typecheck_workflow runs tsc --noEmit on the workflow directory and returns parsed errors as JSON. The AI coding assistant fixes the workflow file, then runs the suite. |
| What the test author writes the test in | Vendor scripting language (TestComplete script, Ranorex Studio, Squish), VBScript variants, or a thin Python wrapper around an unmaintained driver. | Plain TypeScript. Every workflow is a tsconfig.json plus .ts files the SDK already types end-to-end. The same code you would write in a normal node project. |
| Surface the AI coding assistant talks to | A GUI with a record-and-replay button. The assistant has no way to invoke it programmatically without UI automation against the test tool itself. | An MCP server with 35 tools. Read, edit, grep, glob, screenshot, click, type, validate, execute_sequence, typecheck_workflow. All callable as JSON-RPC from any MCP-aware editor. |
| How tsc errors are reported back to the model | Stdout strings. The model sees a wall of compiler text and has to parse line numbers itself, often badly. | Structured TypeError objects with file, line, column, code, message, and a seven-line context with an arrow on the error line. The model receives JSON, not text. |
| Fallback when the toolchain is missing | Often a silent skip or a confused 'tsc not installed' shell error. Sometimes a vendor licensing prompt. | Bun first, npx second. If neither is on PATH the tool returns 'Neither bun nor npx found. Install bun or Node.js.' as the error string. The assistant can read it and act. |
| Where the gate sits in the run lifecycle | Optional pre-build step the test author has to wire up themselves, usually in a YAML CI file. | A first-class MCP tool the assistant can call before execute_sequence. The decision to gate is made by the model on a per-run basis, not by a CI workflow. |
| License of the gate code | Proprietary. Per-seat or per-runner. The reporter plugin is rarely open source. | MIT. typecheck.rs is 278 lines you can read in an evening, fork in an afternoon. |
| What 'green' means | Vendor reporter says PASS. Often based on stdout scraping with no schema. | TypecheckResult.success is true and error_count is 0 and raw_output is None. Three independent fields the assistant can assert against. |
Producers on the left, consumers on the right
The hub is typecheck_workflow. On the left it pulls from the assistant's tool call, the workflow source, the resolved package manager, and the SDK type definitions. On the right it fans out to a structured result, an enriched per-error context, an optional raw stderr, and a tracing log line. The whole shape is designed so the gate runs on a workspace folder and only a workspace folder, no global state.
typecheck_workflow inputs and outputs
What the assistant sees when it calls the tool
A single round trip from the assistant's point of view. It asks for a typecheck. The agent picks bun, runs tsc, parses the output, returns a structured result. The assistant edits the file, re-asks, and only proceeds to execute_sequence once the result is green.
Four parties, one repair loop
The full handshake from the AI coding assistant out to the workflow files and back. The agent calls bun, bun runs tsc, tsc reads the source, the parser extracts five fields per error, the formatter attaches seven lines of context, the result lands as JSON on the assistant's side. If success is false the assistant edits, then re-runs the same loop. Only after a green result does execute_sequence get called.
assistant -> mcp agent -> tsc -> workflow
Wiring the gate into your suite
Six concrete steps, each one bounded to a real file. The shape is the same for a fresh project and a retrofit. The pattern works because typecheck_workflow is just another tool the assistant can call; there is no special wiring required beyond a system-prompt rule.
Lay the workflow out as a normal TypeScript project
A folder with tsconfig.json at the root and your .ts test files under src/. The SDK package @mediar-ai/terminator is a dev dependency. The same shape any node project uses; no proprietary file format.
Tell the assistant to call typecheck_workflow before execute_sequence
In the system prompt or workflow rules: 'Always invoke typecheck_workflow on workflow_path before execute_sequence. If success is false, fix the file, then re-invoke until green.' That single rule turns a runtime failure into a typecheck failure.
On failure, the assistant edits with edit_file
It receives the structured TypeError list. file, line, column, code, message, and the seven-line context with the error line marked. It calls edit_file with the precise old/new strings, no guessing about location.
Re-run typecheck_workflow until success is true
The agent allows arbitrary loops. The model decides when to stop. In practice 1 to 3 iterations clear most workflow regressions because every error carries its own context.
Only then call execute_sequence
The runner spawns the workflow under bun (preferred) or node, attaches the event pipe, and forwards step events as MCP notifications/progress. By the time UI moves, the workflow is type-safe.
Capture the green TypecheckResult in the run record
execution_logger.rs writes a JSON record per tool call under %LOCALAPPDATA%\mediar\executions\. The typecheck_workflow result is one of those records. Seven-day retention, replayable on demand.
What this primitive unlocks in practice
Eight things you can do with this in place. Every one is a check against the current code, not a marketing slogan. The same primitive replaces what most desktop test tools require a CI pipeline plus a vendor reporter to express.
Capabilities this gate adds
- Catch a typo in a Locator selector before any window opens, not after a 30-second launch and login
- Hand the AI coding assistant a structured TypeError list that already includes a seven-line context with an arrow on the error line
- Refuse to run execute_sequence if typecheck_workflow returns success: false, by policy in the assistant's system prompt
- Use bun preferentially when present, fall back to npx, fail loudly if neither is available, all without bespoke shell logic
- Avoid licensing a proprietary reporter plugin to surface tsc output back to the model
- Loop the typecheck-edit cycle until the workflow is green, then run, with no human in the inner loop
- Keep raw_output out of the result on green runs so the payload stays compact
- Read the gate's source in an evening (278 lines, MIT) and fork it for a different language toolchain
Anchor fact
The whole gate lives in crates/terminator-mcp-agent/src/tools/typecheck.rs. 278 lines, MIT. The five-field regex, ^(.+?)\(\(\d+\),\(\d+\)\):\s*error\s+\(TS\d+\):\s*(.+)$, sits in parse_tsc_output at line 54. The seven-line context formatter get_error_context lives at line 90 and emits 3 lines, the error line marked with " -> ", then 3 more lines, all with a four-space gutter via format!("{}{:>4}: {}", marker, line_num, line) at line 111. The runner picks bun, then npx, then errors. The tool is wired into the MCP server at server.rs:9521-9545. The unit tests at the bottom of the file round-trip the regex against four shapes (single, multiple, none, noisy) and the context formatter against a temp file with a known TS2322 error; run cargo test -p terminator-mcp-agent typecheck to see them pass.
Numbers you can verify from the repo
Every figure is a wc -l or a literal count of named items in the source. None of them require running the binary.
0
lines in tools/typecheck.rs
0
fields parsed per tsc error
0
lines of context per error
0
MCP tools the agent ships
Put a tsc gate in front of your desktop test suite
Bring a workflow folder you already have. We will wire typecheck_workflow into your MCP-aware editor, watch the assistant repair a bad selector before any window opens, and hand you the rule to put in your system prompt by the end of the call.
Frequently asked
Frequently asked questions
Why is shift-left for desktop test automation tools different from shift-left for browser tests?
Browser tests have a typecheck story by accident: most teams write Playwright in TypeScript, run tsc as a CI step, and the IDE catches the rest. Desktop test tools, on the other hand, sit on three discontinuities. The test process drives Win32 UI through UIAutomation, the assertions live in a vendor scripting language (TestComplete, Ranorex, Squish), and the AI assistant on top of the stack speaks an entirely different protocol. Shift-left typically means 'add a tsc step in CI'. That does not help an AI coding assistant that is editing a workflow file at 2am between two execute_sequence calls. typecheck_workflow makes the gate a tool, not a CI job, which is why it can sit inside the agent loop instead of outside it.
What exactly does the regex at line 54 of typecheck.rs match?
The pattern is ^(.+?)\((\d+),(\d+)\):\s*error\s+(TS\d+):\s*(.+)$. Five capture groups: the relative file path (non-greedy), the 1-indexed line number, the 1-indexed column number, the TS error code (TS2345, TS2304, etc.), and the human-readable message. The anchor on each end and the explicit 'error' keyword keep warnings, info notes, and unrelated tooling output out of the parse result. parse_tsc_output trims each line first, so trailing whitespace from terminal pipes does not break the match. Lines that do not match are silently dropped, which is the right default: tsc emits banners, file lists, and progress lines that have no business becoming TypeError records.
Why seven lines of context, and why an arrow?
Three lines before, the error line, three lines after. That is the standard rg, fd, and modern compiler output shape, and it is what models like Claude Sonnet, GPT-4o, and Gemini 2.0 are best at consuming. Less context and the model loses scope (which function is this in, what variable was just declared); more context and the prompt budget bleeds. The arrow marker (' -> ' on the error line, four spaces on the others) is intentionally a four-character gutter so the JSON output is column-aligned regardless of line-number digit width. get_error_context formats it with format!("{}{:>4}: {}", marker, line_num, line) at line 111. The model can render it inline in the chat without any additional parsing.
Why bun first and npx second?
Two reasons. First, bun spawns a TypeScript runtime in single-digit milliseconds where node + tsx adds 200 to 500ms of cold start, and a typecheck loop the assistant runs three or four times per repair cycle compounds that delay. Second, bun's bundled tsc is fully API-compatible with the npm tsc, so 'bun tsc --noEmit' produces identical output to 'npx tsc --noEmit'; there is no behavior difference for the regex parser. The fallback exists because not every customer environment has bun on PATH (macOS dev machines vs Windows CI runners differ), and silent failures at the tool level would block the agent loop. The match returns a clean error string when both are missing; the model handles the message rather than the agent.
What happens if tsconfig.json is missing or workflow_path does not exist?
Two early-return checks. workflow_path absent returns Err(format!("Workflow path does not exist: {}", workflow_path)). tsconfig.json absent returns Err(format!("No tsconfig.json found in: {}", workflow_path)). Both surface as a CallToolResult::error to the MCP client, which the assistant reads as a tool error rather than a TypecheckResult. That distinction matters: an Err means the gate could not run, so the assistant should fix the workspace; a TypecheckResult { success: false } means the gate ran and the workflow has type errors, so the assistant should fix the source. The two failure modes do not get conflated.
How does this compare to TestComplete, Ranorex, WinAppDriver, FlaUI, AutoIt, or Squish?
The comparison points are different in kind. Those tools are GUI-first products with a recorder, a script editor, and a runner. The 'tool surface' an AI coding assistant can call is at best a CLI to start the recorder, sometimes a REST endpoint to launch a saved test. None of them ship an MCP server or expose 35 named tools as JSON-RPC, and none ship a typecheck-before-execute primitive that returns structured errors. In practice a team using one of them and an AI assistant ends up writing a wrapper script that scrapes the vendor reporter's stdout, which is exactly the loop typecheck_workflow eliminates. The Terminator answer is to make the model the editor and the agent the runner, with TypeScript as the only language between them.
Does this replace JUnit XML reporting?
No, it sits before it. typecheck_workflow only addresses static type errors in the workflow source. Runtime failures (an element does not appear, an assertion fails, a network call hangs) are still reported through the event pipe as StepFailed events and aggregated into whatever final reporter the team wants. A common pattern is to keep a thin sink on top of the event stream that emits JUnit XML at the end of a run; the typecheck step is a separate gate the assistant can invoke standalone. Picture two layers: type errors caught by tsc before any UI runs, runtime errors reported through the event stream while UI runs.
Can I add my own pre-execute gate alongside typecheck_workflow?
Yes. The MCP tool surface is open. Adding a tool follows the rmcp #[tool] macro pattern in server.rs (35 examples to copy), so a 'lint_workflow' or 'eslint_workflow' or even 'biome_workflow' gate is roughly the same 200 lines of Rust as typecheck.rs: a parser, an enrichment pass, a JSON-shaped result. The workflow rules in the assistant's system prompt then call them in sequence: lint, typecheck, execute_sequence. Because the MCP transport is JSON-RPC, the tools are addressable by name; the model decides the order. The agent does not enforce an order beyond what individual tool descriptions suggest.
What does 'success' mean in TypecheckResult?
Three conditions, all required. output.status.success() (the tsc process exited with code 0). error_count == 0 (the regex parser found zero matching lines). The function returns TypecheckResult { success: output.status.success() && error_count == 0, ... }. A non-zero exit but zero parsed errors is treated as a failure, which guards against tsc emitting an unexpected error format that the regex misses. A zero exit with parsed errors is also treated as a failure, which guards against a regression in tsc's exit-code semantics. Both edges close. The raw_output field is None on success, Some(combined) on failure, so a green run is small and a red run is debuggable.
Where in the repo can I read this and prove it runs?
crates/terminator-mcp-agent/src/tools/typecheck.rs, 278 lines, MIT licensed. The unit tests at the bottom (#[cfg(test)] mod tests, lines 206 to 277) cover parse_tsc_output for single, multiple, no, and noisy outputs, and get_error_context against a temp file with a known type error. The integration into the MCP server is at crates/terminator-mcp-agent/src/server.rs lines 9521 to 9545: a #[tool(description = ...)] block that wraps typecheck::typecheck_workflow into a CallToolResult. Run cargo test -p terminator-mcp-agent typecheck to see the parser tests pass. The full file is short enough to read in one sitting.