Pre-flight liveness probe / desktop CI

Every guide to automation testing tools for desktop application ranks the same names. None of them tell you how to detect that the screen is dead before the suite starts.

The honest reason desktop test runs are flaky in CI is rarely the test code. It is that the agent VM lost its RDP session, the virtual display went to zero dimensions, or the UIAutomation COM host wedged on the previous run, and the test framework noticed only after burning the per-element implicit wait on every locator. Terminator’s MCP agent ships an HTTP /ready endpoint that runs a four-step UIAutomation probe in under five seconds and returns HTTP 200, 206, or 503. Curl it before you start the suite. If it is 503, fail the build and skip the runner; if it is 200, your tests have a real desktop underneath them.

Matthew Diakonov, Written with AI

Published April 28, 20269 min read

Why a separate probe matters

Pull up any list of automation testing tools for desktop application work and you will see the same shortlist: WinAppDriver from Microsoft, TestComplete from SmartBear, Ranorex, Katalon Studio, Tosca, T-Plan Robot, UFT One, Winium. They all do roughly the same job: drive Windows controls through some flavor of UI Automation, optionally with image-matching as a fallback, and offer a record-and-playback workflow on top. The feature-matrix differences between them are real but small.

What none of them ship is a primitive for the operational reality that destroys most desktop CI fleets. RDP sessions drop. Virtual display drivers misconfigure on patch Tuesdays. The UIA COM host occasionally wedges after enough consecutive runs. When any of that happens, every locator in every test in the suite goes through its full retry budget before the runner gives up. A 100-test suite that should have failed in two seconds instead burns 47 minutes producing red logs that all say could not find element, and the next CI run on a fresh agent goes green, so the failure looks intermittent forever.

The probe described below is small, surgical, and you can call it from any pre-test step regardless of which test framework you already use. It does not replace any tool on the shortlist. It answers one question that none of them answer: is the screen even alive?

“The whole probe is wrapped in tokio::time::timeout(Duration::from_secs(5)) inside spawn_blocking, so a wedged UIAutomation host cannot block longer than five seconds even if every COM call inside hangs.”

crates/terminator/src/platforms/windows/health.rs lines 35-58

The four checks, in order

The whole probe is one synchronous function, perform_sync_health_check(), that runs four checks in order and updates a single HealthCheckResult as it goes. Each step catches a different failure mode. A test runner that reads the response can tell which step failed and decide what to do.

Step 1. Initialize COM in multi-threaded mode

Calls CoInitializeEx(None, COINIT_MULTITHREADED) and tolerates RPC_E_CHANGED_MODE (0x80010106) so a process that already initialized COM in a different mode is still considered healthy.

This catches the failure where the host process was launched without a thread that ever called CoInitializeEx. Without it the next call to UIAutomation hard-fails with E_NOINTERFACE rather than a useful error, and a normal try/click test would just hang on the COM apartment. The probe records com_initialized: true|false in the response diagnostics so you can tell this case apart from a UIA failure.

Step 2. Build a fresh UIAutomation instance

Runs UIAutomation::new_direct() and flips api_available to true on success. Failure is reported back as 'UIAutomation creation failed' with the underlying error string.

new_direct bypasses the cached automation singleton and forces a real CoCreateInstance for CUIAutomation. If a stale UIA host process is wedged (a common failure on long-lived Azure agent VMs that have been running for weeks), this is the line that throws first. The probe answers in milliseconds and the error message gets logged at tracing::error! level.

Step 3. Get the desktop root element

Calls automation.get_root_element() and flips desktop_accessible to true. On failure it queries virtual_display::is_headless_environment() and adds is_headless and a 'virtual display may be disconnected' diagnostic to the response.

This is the step that catches a dead RDP session. The error message returned literally reads "Cannot access desktop: ... This typically indicates RDP disconnection or virtual display issues." The diagnostic block is what lets a CI script branch: a healthy machine with no RDP gets one error class, a headless agent VM with a misconfigured virtual display gets another.

Step 4. Enumerate desktop children with a true condition

Builds automation.create_true_condition() and walks root_element.find_all(TreeScope::Children, &condition). Counts the result, adds desktop_child_count to diagnostics, and only flips can_enumerate_elements when the count is greater than zero.

Zero children on a logged-in desktop is the silent killer for a desktop test suite. The screen looks alive (Step 3 passed), UIAutomation answered (Step 2 passed), but the tree is empty because the display is connected at the OS level and disconnected at the user-session level. The probe explicitly downgrades to "Desktop has no child windows - display may be disconnected" rather than letting the next test wait for a button that will never appear.

What comes back in the JSON

The HTTP body is intentionally flat. Every field on the response has a fixed key and a fixed type, which makes it easy to parse with jq or pipe into a metrics agent. The full shape lives in HealthCheckResult in crates/terminator/src/health.rs and gets nested under the automation key by the MCP agent’s readiness_check() handler.

Fields on the /ready response

platform: windows | macos | linux (std::env::consts::OS)
api_available: true after UIAutomation::new_direct() succeeds
desktop_accessible: true after get_root_element() succeeds
can_enumerate_elements: true when desktop_child_count > 0
check_duration_ms: wall-clock time for the four steps in ms
diagnostics.com_initialized: bool, separates COM failure from UIA failure
diagnostics.desktop_child_count: usize, lets you alarm on tree shrinkage
diagnostics.is_headless: bool, true when TERMINATOR_HEADLESS=1 is set
diagnostics.display_width / display_height: i32, zero means RDP detached
error_message: a single string with the first failure encountered, or null

The HTTP status code is the load-bearing part

The probe maps its three internal health states onto three HTTP status codes via HealthStatus::to_http_status(). That is what makes curl -f from a CI script useful: a single exit code carries the right amount of information.

Healthy

All four checks passed. api_available, desktop_accessible, and can_enumerate_elements are all true. Run the suite.

Degraded

UIAutomation is reachable but the desktop or the tree is not. Usually means the session dropped after start. A smart CI script alerts humans rather than restarts.

Unhealthy

COM never initialized or UIAutomation refused to instantiate. The runner is broken. Fail the job, requeue the test, ideally on a different agent.

Naive pre-test step versus a real one

The cost of not doing this is concrete. Toggle between the two scripts to see what changes.

pre-test.sh

# pre-test.sh — naive way most desktop CI pipelines do it
# Hope the agent VM is alive. Run the suite. Read the failures.

echo "starting desktop test suite"
pytest tests/desktop/ --maxfail=10

# what actually happens on a dead RDP session:
#   - Selenium/WinAppDriver hangs for the full implicit wait
#     on every locator in every test, then ConnectionRefused
#   - test runner exits with mixed timeouts and TimeoutException
#   - you get 47 minutes of red CI logs that all say "could not
#     find element" and zero indication that the session itself
#     was dead before t=0
#   - rerun usually "fixes" it because the next agent has a
#     fresh RDP session, so the bug looks intermittent forever

-82% more honest CI exit codes

What the rest of the desktop testing tool category misses

To be fair, most of the named players in this category are good at their actual job. WinAppDriver is the canonical Windows UI Automation driver and is genuinely solid for WPF, WinForms, Win32, and UWP. TestComplete and Ranorex have polished object spies and integrate cleanly with CI plugins. Katalon and Tosca cater to mixed audiences. T-Plan Robot has cross-platform reach via VNC. None of that is wrong.

What the whole category treats as someone else’s problem is the agent-VM operational layer. There is no documented WinAppDriver endpoint to ask “is the desktop alive?” before you POST a session. There is no built-in TestComplete API that tells you whether the COM apartment is wedged before your project loads. The implicit assumption across the category is that the test framework will discover that condition by failing on the first locator, which is exactly the behavior that makes the failure mode look intermittent and turns a five-second config bug into 47-minute red CI runs.

The probe in this guide is a few hundred lines of Rust and a single curl. It does not stop you from using any of the tools above. It just makes the fact-of-life that desktops can be dead at the OS level into a checkable HTTP status code, so the rest of the tooling can keep doing what it does well.

Honest limitations

The probe is Windows-first. The macOS and Linux paths return HealthStatus::Healthy with a TODO note (“Accessibility API health checks not yet implemented”); on those platforms it tells you the process is alive but does not actually probe the AX or AT-SPI tree. If you are testing macOS desktop apps, treat the macOS branch as a placeholder for now.

The probe is also single-shot, not continuous. Running it before the suite catches the most common failure mode (the screen was dead at t=0). It does not catch the screen dying mid-run. If your suite is long enough that this matters, the typical pattern is to wrap each test in a try-catch that re-hits /ready on first failure and decides whether to retry or to abort the whole run.

And the probe does not run any of your real tests for you. It is a pre-flight check, not a smoke test. A green 200 means the probe’s four steps passed. It does not mean Calculator is installed, your application has launched, or your golden fixtures are correct. Those failures still need real tests.

Want a walkthrough of /ready in your CI?

Show us how your desktop test fleet is wired today and we will sketch where the probe slots in.

More questions about pre-flight liveness for desktop tests

Why does desktop test automation need a separate liveness probe at all? Doesn't a 'test for the existence of a button' already prove the screen is alive?

It proves it after a per-test implicit wait. WinAppDriver, Appium for desktop, pywinauto, AutoIt, and most other Windows automation libraries default to retrying element lookups for tens of seconds before giving up. So when the RDP session has dropped, every test in the suite eats the full retry budget on its first locator, then fails with a generic 'could not find element' or 'TimeoutException'. A 100-test suite can burn an hour producing logs that look like 100 unrelated bugs and look intermittent because the next CI run gets a fresh agent. A single-shot probe before the suite starts converts that into one unambiguous fail-fast: HTTP 503 with com_initialized:true, desktop_accessible:false, and a literal 'RDP disconnection or virtual display issues' string in the error_message field.

What does the four-step probe actually run, and where does it live in the source?

It lives in two files. crates/terminator/src/health.rs (229 lines) defines the cross-platform HealthCheckResult struct, the HealthStatus enum (Healthy / Degraded / Unhealthy), and the to_http_status() mapping (200 / 206 / 503). crates/terminator/src/platforms/windows/health.rs implements perform_sync_health_check() with the four ordered steps: CoInitializeEx(None, COINIT_MULTITHREADED), UIAutomation::new_direct(), automation.get_root_element(), and root_element.find_all(TreeScope::Children, &automation.create_true_condition()). The whole sync function is wrapped in tokio::task::spawn_blocking and then in tokio::time::timeout(Duration::from_secs(5)) so a wedged UIAutomation host can never block the probe past five seconds. The HTTP route is mounted at /ready in crates/terminator-mcp-agent/src/main.rs around lines 866-941.

How is /ready different from /health on the same agent? They sound like the same thing.

They are deliberately split. /health is a 'process is alive and HTTP server is responding' liveness check that returns 200 unconditionally and never touches UIAutomation. It is meant for Azure Load Balancer health probes that fire every five to fifteen seconds, mediar-app monitoring that fires every thirty, and Kubernetes liveness probes. /ready is the deep readiness check: it calls into UIAutomation and can take 500ms to 5s, so it is meant for pre-deployment validation, pre-test gating, Kubernetes readiness probes, and manual diagnostics. The comment block above readiness_check() in main.rs warns explicitly: 'NOT recommended for frequent automated monitoring - use /health instead.' Hitting /ready every five seconds in a load balancer would slow down the actual MCP traffic.

What does HTTP 206 mean in the context of a desktop liveness probe? That status code is for partial content.

It is reused as the 'degraded' state. HealthStatus::to_http_status() returns 200 for Healthy, 206 for Degraded, and 503 for Unhealthy. Degraded is the case where api_available is true but either desktop_accessible or can_enumerate_elements is false. In practice that happens when COM and UIAutomation are both responsive but the desktop session has just dropped: you can talk to the API, you cannot find the screen. Picking 206 instead of 503 lets a smarter CI script branch on three states rather than two. A pre-test gate can decide to retry on 503 (full failure, probably worth restarting the agent) and exit fast on 206 (partial failure, probably worth alerting humans).

Where does the headless detection actually come from? The Windows file references is_headless_environment but doesn't define it.

It lives in crates/terminator/src/platforms/windows/virtual_display.rs. The function checks the TERMINATOR_HEADLESS environment variable (any value of 'true' or '1' counts) and returns true. It is intentionally minimal because the more interesting signal is downstream: when the probe gets a root element back but the bounding rectangle has width or height of zero, the 'display has zero dimensions' diagnostic fires. That is the actual signal that a virtual display driver is misconfigured rather than a real RDP disconnect, and it is what lets you tell apart 'this CI runner needs to be restarted' from 'the indirectdisplay driver was reinstalled wrong.' The current implementation could check session ID and remote-session flags too, but the env-var trip wire plus the zero-dimension check has been enough for the failure modes seen in production.

Can I run /ready against a remote desktop test farm, or does it have to be local?

Local to the agent VM. The terminator-mcp-agent binds the HTTP server on 127.0.0.1:3000 by default and the /ready check runs against the UIAutomation API on the same machine, because UIAutomation is a per-session COM object that cannot be reached from another box. The realistic deployment is one MCP agent per CI worker VM, with curl http://127.0.0.1:3000/ready as the pre-test step in the same job. If you have a fleet, point each runner at its own local agent and aggregate the JSON response into your monitoring stack alongside the diagnostics block, which already serializes to a flat shape that fits Prometheus or any other JSON-friendly metrics pipeline.

What happens on macOS and Linux? The Windows file is detailed but the cross-platform code looks like it just returns 'healthy' for those.

Correct. crates/terminator/src/health.rs defines MacOSHealthChecker and LinuxHealthChecker with stubs that return HealthStatus::Healthy and a diagnostic note ('Accessibility API health checks not yet implemented' or 'AT-SPI health checks not yet implemented'). Windows is the primary target because that is where the operationally-painful failure modes live (RDP, virtual displays, indirectdisplay drivers, COM apartments). On macOS the equivalent failure shape is an Accessibility permission revocation or a screen-recording prompt that has been dismissed, and the right probe there would call AXIsProcessTrustedWithOptions and try a real AXUIElementCopyAttributeValue against the system-wide AXUIElement. That work is on the roadmap; today, the Windows probe is the load-bearing one.

How does this fit alongside an existing test framework like Pytest, NUnit, or Mocha? I don't want to rewrite my suite.

It does not replace the test runner. It runs as a pre-step in the same CI job: start the MCP agent in the background, sleep two seconds, curl /ready, exit non-zero on anything except 200, then run your existing pytest / dotnet test / npx mocha command unchanged. The probe is read-only against UIAutomation, so it does not interfere with anything the test does later. The only state it leaves behind is a background MCP agent process, which you either let CI clean up or kill explicitly at the end of the job. The same pattern works whether the suite uses Terminator's own SDK, WinAppDriver, AutoIt, or pywinauto for the actual clicks.