Cross-platform desktop automation: how to verify the runtime is actually healthy before you click
A scripted click that never lands costs you more than a test that fails fast. The fix is to verify the desktop automation runtime itself, on every platform, before any scenario starts. Terminator exposes that verify call as one HTTP endpoint, with one JSON shape on Windows and macOS.
Hit the MCP agent at GET /ready. It returns HTTP 200 with status: "ready" when the platform accessibility API is up, the desktop root is reachable, and the tree enumerates. It returns 206 Partial Content for degraded states and 503 Service Unavailable when the API is unreachable. The body is identical on Windows and macOS: three booleans, a duration in milliseconds, an optional error string, and a diagnostics bag.
Source: crates/terminator/src/health.rs and crates/terminator-mcp-agent/src/main.rs.
The contract is the same on both platforms
Every platform implements one Rust trait, PlatformHealthCheck, and returns one shape, HealthCheckResult. The convenience function check_automation_health() picks the right implementation at compile time using cfg(target_os = "..."). That is the part that makes cross-platform verify actually portable: a single client can poll one URL and reason about one JSON object regardless of whether the agent is sitting on Windows or macOS.
What every HealthCheckResult tells you
- api_available — the platform automation runtime loaded (UIAutomation on Windows, AX on macOS).
- desktop_accessible — the root desktop element is reachable from this process.
- can_enumerate_elements — children of the root come back without error.
- check_duration_ms — how long the probe took. CI lanes can alarm on this independently.
- diagnostics — per-platform key/value bag. On Windows you get com_initialized and is_headless. On macOS you currently get a note that AX-level checks are not yet wired up.
“The endpoint maps three booleans to HTTP 200, 206, or 503. CI gates can use status codes; humans get the JSON.”
crates/terminator/src/health.rs
What the call looks like in flight
The CI runner hits one HTTP endpoint. Inside the process, the readiness handler calls the shared probe, which dispatches to the platform implementation, which talks to the native automation API. Every step is bounded.
GET /ready, end to end
The Windows probe is a five-step check with a hard timeout
On Windows the verify path is real work: COM init, UIAutomation construction, desktop root, enumeration, status rollup. The whole sequence runs inside tokio::time::timeout(Duration::from_secs(5), ...), so a hung COM call returns a clean 503 with an error string instead of taking the readiness lane offline.
WindowsHealthChecker::check_health
- 1
Initialize COM
CoInitializeEx with COINIT_MULTITHREADED. Tolerates RPC_E_CHANGED_MODE (0x80010106) if another part of the process already initialized.
- 2
Create UIAutomation
UIAutomation::new_direct(). Sets api_available = true on success. Fails when the platform service is disabled or the session is non-interactive.
- 3
Get desktop root
automation.get_root_element(). Sets desktop_accessible = true. The branch also checks is_headless_environment(), so headless CI gets a diagnostic instead of a misleading false.
- 4
Enumerate children
Walk a small fan-out of the root to confirm the tree is queryable. Sets can_enumerate_elements = true. This is the one that catches degraded sessions where the API loads but returns empty trees.
- 5
Roll up status
update_status() collapses the three booleans to Healthy, Degraded, or Unhealthy and maps to HTTP 200, 206, or 503. The whole thing runs inside a 5-second tokio::time::timeout so a hung COM call cannot deadlock the probe.
File: crates/terminator/src/platforms/windows/health.rs. The timeout is on line 35, COM init around line 70, the UIAutomation construction around line 91, the root element retrieval around line 107. Headless detection lives in the same file via is_headless_environment() and surfaces in the diagnostics bag rather than silently failing the check.
A healthy response, and an unhealthy one
The body is small enough to read in one breath. On a working Windows session you get a 200 with three trues and a duration. On a wedged session you get a 503 with the timeout string.
What verify means on macOS today
The MacOSHealthChecker in health.rs currently returns an optimistic Healthy result with a diagnostic note that reads Accessibility API health checks not yet implemented. That is the right behavior for a stub: it is honest, the field is in the JSON, and a caller that cares can branch on it. Until the AX-level probe lands, treat the macOS /ready response as a liveness signal that the agent is up and Cocoa is reachable, and rely on a small canary script (open a known app, query the AX tree of its window) for harder verification.
The two trapdoors that bite on macOS are TCC permissions and a non-interactive session. TCC denials show up as empty AX trees rather than errors, which is why the planned macOS probe will need to actually walk a known window, not just instantiate the AX client. The Linux branch in the same file has the same shape and the same honest TODO.
How this changes your CI lane
The shape of a desktop automation job becomes the same on both operating systems. Boot the agent, poll /ready until the rolled-up status is ready (or treat degraded as acceptable if you have made that call), then run the actual scenario. If a scenario then fails, you have a real bisect: was /ready healthy a moment before? If yes, the failure is in your selector or the app under test. If no, the runtime broke and the scenario was never going to pass. That bisect is the whole point.
The cost of adding this gate is negligible. The probe runs in a couple hundred milliseconds on a warm Windows runner, the timeout caps the worst case at five seconds, and a status-code check is one line of bash or one assertion in your test runner. The cost of not adding it is the class of flake that every long-lived desktop suite eventually accumulates, where one machine in the pool quietly degrades and every scenario fails for reasons that have nothing to do with the code under test.
Need this gate in your pipeline?
Walk me through your runner setup and we will sketch the exact /ready poll, the timeout, and the alarm thresholds for your platform mix.
Frequently asked questions
What does verify mean for cross-platform desktop automation?
Three things you can actually check from outside the process: the OS-level automation API loaded, the desktop root is reachable, and a tree walk over its children returns without error. Terminator exposes those as api_available, desktop_accessible, and can_enumerate_elements on a single JSON shape that is identical on Windows and macOS. The endpoint that returns them is /ready on the MCP agent, and it maps the rolled-up status to HTTP 200, 206, or 503 so any CI runner can gate on it without parsing the body.
How is /ready different from /health on the same agent?
/health is liveness only. It confirms the process is alive and the HTTP server is responding and that is it; it does not touch UIAutomation or AX, so it cannot block during heavy automation workloads. /ready is the deep check that actually probes the platform API. The split exists because Azure load balancers and Kubernetes liveness probes need a cheap call every 5-15 seconds, but pre-deployment validation and on-call diagnostics want the expensive one. Use /health for keep-alive, /ready for go/no-go.
Does the same probe really run on Windows and macOS?
Same trait, same return shape, different bodies. PlatformHealthCheck::check_health is implemented by WindowsHealthChecker in crates/terminator/src/platforms/windows/health.rs, which does the COM init plus UIAutomation walk described on this page. macOS today returns an optimistic Healthy result with a diagnostic note saying AX-level checks are not yet implemented, so on macOS you treat /ready as a liveness signal until that lands. The README and source are explicit about this, which is the right move; an honest stub beats a green light that lies.
Why is the Windows check wrapped in a 5-second timeout?
UIAutomation is COM, and COM calls can hang when another desktop session is paged out, the WinStation is locked, or a parked screensaver is owning the foreground. A 5-second tokio::time::timeout around the spawned blocking task means the worst case is one 503 with a clear error_message string, not a deadlocked probe that takes the readiness lane down with it. The duration that ships in check_duration_ms lets you alarm on slow probes long before they reach the timeout.
Can I use /ready as a CI gate before a Playwright-style desktop suite runs?
Yes, that is the canonical use. Start the agent on the runner, poll /ready until status is ready (or fail the job after N attempts), then start your scenario. On Windows the probe also writes is_headless to diagnostics; if you are running on a headless agent, you can choose to allow Degraded as success since enumeration in a headless session is its own can of worms. Treat 206 as a decision point, not an automatic fail.
How is this different from running a sample script and seeing if it passes?
A sample script verifies the script. /ready verifies the runtime that the script depends on. The distinction matters in two cases. First, when a real scenario takes 90 seconds, you want a sub-second gate before you start. Second, when a scenario fails, you want to know whether your locator is wrong or whether the automation runtime is actually broken on this machine. The three booleans give you that bisection for free.
More from Terminator on desktop automation internals.
Related guides
Accessibility tree vs PyAutoGUI for desktop automation
Why structural element lookups beat OCR and pixel matching for cross-platform automation.
Browser automation hits a desktop ceiling
The seven concrete moments Playwright goes silent, and the OS-level tool that takes over.
MCP desktop accessibility automation
How the MCP server gives Claude, Cursor, and VS Code real control over native apps.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.