Matthew Diakonov, Written with AI

Published May 3, 202611 min read

LLM desktop automation in April 2026: every model release, plus the one-line replacement that decides whether their clicks actually land

Two stories ran in parallel for LLM-driven desktop automation in April 2026. The loud one is models. Microsoft shipped Fara-7B on April 19, Google shipped four Gemma 4 variants under Apache 2.0, Anthropic previewed Claude Mythos to partners, and Manus kept its long-horizon agent loop iterating in public. Every roundup on this topic stops there. The quieter one is on the OS side: on April 2, Terminator 0.24.31 replaced a single Windows API call that had been silently breaking multi-monitor clicks for every LLM that drove the desktop through it. This page covers both, but spends most of its words on the second, because nobody else does.

Direct answer (verified 2026-05-03)

April 2026 produced two kinds of LLM desktop automation news. The model wave: Microsoft Fara-7B (April 19, 7B parameters, paired with the CUAVerifierBench), Gemma 4 family (Apache 2.0), Manus updates, Claude Mythos preview. The framework wave: Terminator 0.24.31 (April 2, commit e36b9785, closing issue #473) replaced Windows UI Automation's IsOffscreen check, which returns true for elements that sit on a secondary monitor, with a manual bounds-intersection across every connected display. Diff: 71 insertions, 31 deletions, in a single file. Every multi-monitor LLM workflow on Windows depended on it.

The April 2026 model wave, briefly

The model side of LLM desktop automation in April was unusually dense. Five releases worth knowing, and one structural observation: the bottleneck has moved off the model in basically every category. Reasoning is good. Tool-use is good. Visual grounding for interactive screenshots is good. What still breaks is the bridge from “the model picked the right thing” to “the click happened on the right pixel.”

Microsoft Fara-7B (April 19)

Microsoft's first agentic small language model purpose-built for computer use. 7B parameters, state of the art in its size class, paired with the new CUAVerifierBench benchmark for verifying CUA agent traces. The point of Fara-7B is to push computer-use into the size class you can run on a workstation GPU.

Gemma 4 family

Google shipped four Gemma 4 variants under Apache 2.0 in early April. The mid-size variants are competent enough at structured tool-use that the open-weights ceiling for desktop control moved up a clear notch.

Manus updates

Manus continued to be the most aggressively agent-forward general assistant in April, opening browsers, terminals, and files in long autonomous loops without human turns in the middle.

Claude Mythos preview

Anthropic previewed Claude Mythos to a small partner cohort in April. Public benchmarks did not land, but the partner notes describe meaningful tool-use improvements over Sonnet 4.5.

Computer use on the long tail

E2B's open-computer-use, Bytebot's containerized Linux desktop agent, and Microsoft's UFO continued to ship in April. None of them are model releases; they are harnesses around the model wave above.

Frame the rest of this guide as a question: if the models are this good, why does anyone still see flaky desktop automation in April 2026? The answer for at least one entire class of failures is the next section.

The headline OS-side fact: `IsOffscreen` lies on a second monitor

Windows UI Automation exposes a method called IUIAutomationElement::IsOffscreen. Reading the docs, you would assume it returns true when the element is not visible to the user. In practice, it returns true for any element whose bounds extend outside the primary monitor's rectangle. An entirely-visible element on a second monitor reports as offscreen. A button on the right-hand display in a horizontal two-monitor setup, where the second monitor starts at x=1920, will be flagged offscreen by this method even when a human is staring directly at it.

Terminator 0.24.30 trusted this method. It was the first gate in validate_clickable(). The downstream effect: every LLM that drove a multi-monitor Windows desktop through Terminator hit ElementNotVisible(“Element is offscreen”) on perfectly valid targets. The model never saw the monitor geometry; the model just saw an unhelpful error and either retried (wasting tokens) or gave up.

User issue #473 on March 30 reported the exact symptom: clicks failed with “not visible” until they dragged the application onto the primary monitor. The fix landed three days later as commit e36b9785 on April 2, 2026. Version 0.24.31 went out the same day.

The bug, drawn

Two round-trips to the same desktop. The first is what Terminator 0.24.30 did when the LLM tried to click a button on the second monitor. The second is what 0.24.31 does, with the same input.

LLM click path: before and after PR #473

The replaced helper is is_visible_on_any_monitor. It enumerates monitors via xcap::Monitor::all() and tests rectangle intersection against the element's bounds. If any monitor wins, the element is visible. No call into the misreporting Microsoft API at all.

What changed in `validate_clickable()`

The clearest way to read the patch is to look at the validation function's shape on either side of the diff. Toggle below.

validate_clickable() before and after PR #473

validate_clickable() trusted Windows UIA's IsOffscreen() as its first gate. The Microsoft API returns true for any element whose bounds extend beyond the primary monitor's coordinate range, including elements that are entirely visible on a secondary monitor. The result on a multi-monitor Windows desktop: ElementNotVisible('Element is offscreen') for valid elements, regardless of which LLM picked the selector.

Trusts IUIAutomationElement::IsOffscreen as the first gate
IsOffscreen returns true for any element on a secondary monitor
Returns ElementNotVisible to the LLM with no monitor context
Five validation steps: detached, visible, enabled, viewport, bounds

Four steps inside the new helper

is_visible_on_any_monitor is small. The whole helper is roughly forty lines including tracing. It does the same four things in order, every call.

The four-step helper at line 316 of element.rs

Drop the lying API
Remove both calls to IUIAutomationElement::IsOffscreen from validate_clickable() and is_visible().
Enumerate monitors
Call xcap::Monitor::all() and capture each monitor's (x, y, width, height) on the virtual desktop.
Test intersection per monitor
For each monitor, check elem_left < monitor_right && elem_right > monitor_x && elem_top < monitor_bottom && elem_bottom > monitor_y. Return on first match.
Log the monitor that won
tracing::debug! prints the monitor name and bounds it intersected with, so future multi-monitor regressions are visible in the log line, not in silent click failures.

Two call sites use it: one inside is_visible() (line 1452, where the result short-circuits the visibility check), and one indirectly via validate_clickable() calling is_visible(). The path that does the actual click work calls validate_clickable from inside every action method on the Windows element implementation.

71 / 31

“One file. 71 insertions, 31 deletions. The Windows API call this replaces is the reason every LLM-driven multi-monitor click on Windows was an ElementNotVisible coin flip.”

commit e36b9785, crates/terminator/src/platforms/windows/element.rs

What now actually works

None of these were broken by the fix. All of them were broken before it. Run any of these in your own multi-monitor Windows setup against the latest terminator-mcp-agent and the click should land first try.

Behaviours unblocked by PR #473

Clicking a button on a secondary monitor without dragging the window to the primary first.
Activating a window that opens on the right-hand monitor in a 2-monitor setup.
Typing into a text box on a vertical secondary monitor with negative y coordinates.
Highlighting an element on any of three or more monitors arranged left-to-right.
Validate-clickable still rejects elements that have zero bounds, or that genuinely sit outside every monitor (a window dragged into a virtual scroll buffer).

Why every April 2026 LLM benefits, not just one

The fix lives below the MCP boundary, which means every model that consumes Terminator's tool surface inherits it without any changes to its prompts, tool schema, or harness. That includes every model release that landed in April, plus the older models that were already deployed.

Every one of these calls validate_clickable() at the bottom of the click stack. The only thing that changed is which Windows API the validation step trusts.

Models and harnesses that inherit the fix automatically

Claude (Sonnet 4.5, Opus 4.7)

Drives Terminator's MCP server through stdio. Selector strings stay symbolic so multi-monitor coordinates never reach the model.

Cursor

Picks up the same MCP config used in Claude Code. Same dispatch path, same fix.

Microsoft Fara-7B

April 2026's small computer-use model. Returns selector tool calls into the same validate_clickable funnel.

Gemini Computer Use

Vision path goes through gemini_computer_use, but the click that lands at the end is still gated by validate_clickable().

Gemma 4

Open-weights variants benefit from the fix as soon as you wire them into the Terminator MCP loop.

Open Interpreter

When configured with Terminator as its OS bridge, every click flows through the same gate.

Windsurf, VS Code MCP

Both consume the same npm-installed terminator-mcp-agent and inherit the multi-monitor fix without code changes.

Manus

Long-horizon autonomous loops on multi-monitor Windows desktops were the exact shape this bug was hitting.

The point is structural. There is no model-side prompt-engineering workaround for a buggy OS-level visibility check. Telling the agent “please make sure the window is on the primary monitor” is the kind of brittle hack that exists in production agent systems shipping today. The right answer is for the bridge to stop lying. April 2 was when ours did.

Why most April 2026 writeups missed this

Three reasons. First, model news is loud and frameworks fixes are quiet, and an SEO writer's incentive structure rewards loud. Second, the bug only surfaces on a multi-monitor Windows setup, and most computer-use development happens on a single laptop screen where IsOffscreen happens to behave fine. Third, the symptom looks like a model failure (the agent “couldn't click the button”) which makes it easy to chalk up to model regression and move on.

A reasonable mental model for LLM desktop automation in 2026: the model contributes intent, the framework contributes the OS bridge, and the OS contributes geometry. Most of this year's interesting failures live at the seam between the framework and the OS, where a shipped Microsoft API quietly gets one assumption wrong. April 2 fixed one of those seams.

Want to see your LLM agent click on a second monitor live?

We can run Terminator against your exact multi-monitor workflow on a 30-minute call, with the validation gate logged in real time so you can see why every previous attempt was failing.

Frequently asked questions

What actually shipped for LLM desktop automation in April 2026?

Two distinct things, and most writeups only cover the first. The model side: Microsoft Fara-7B (April 19) targeted a 7B agentic small language model with the CUAVerifierBench benchmark; Google released the Gemma 4 family under Apache 2.0; Anthropic previewed Claude Mythos to partners; Manus continued shipping incremental updates to its general autonomous agent. The framework side: Terminator 0.24.31 went out on April 2 with PR #473, fixing multi-monitor click validation. The first set is what every model-news roundup covered. The second is what determines whether any of those models can actually click a button on a second monitor on Windows.

What was wrong with multi-monitor click validation before April 2?

Windows UI Automation exposes a method called IsOffscreen on every UIElement. It returns true when the element is not visible. The problem is the implementation: IsOffscreen reports based on the primary monitor's bounds, not the virtual desktop. An element living entirely on a second monitor has bounds outside the primary monitor's rectangle, so IsOffscreen returns true even when the element is fully visible to the user. Terminator 0.24.30 trusted this method as the first gate in validate_clickable(), so any LLM driving a multi-monitor Windows workflow saw clicks rejected with ElementNotVisible errors on perfectly valid targets.

What replaced the IsOffscreen check?

A function called is_visible_on_any_monitor at line 316 of crates/terminator/src/platforms/windows/element.rs. It enumerates every monitor via xcap::Monitor::all(), reads each monitor's x, y, width, and height, then runs a rectangle-intersection test against the element's bounds. If the element's bounds intersect any monitor, the element is visible. The function returns Ok(true) on first match and logs which monitor it intersected with. Both validate_clickable() and is_visible() now use this helper. The old IsOffscreen calls are deleted in both places.

How big is the fix?

One file. crates/terminator/src/platforms/windows/element.rs. 71 insertions and 31 deletions. The commit hash is e36b9785, dated 2026-04-02, and it closes GitHub issue #473 (which had been opened on March 30 by a user reporting the exact symptom). Released as Terminator 0.24.31 the same day.

Why does this matter for any LLM, not just one specific model?

Because the LLM never sees the monitor coordinates. Whether the LLM is Claude, Fara-7B, Gemini, Gemma, or Manus, the model returns a selector like role:Button && name:Save. The MCP server resolves that selector to an element, then asks the OS whether it is clickable. If the OS-side answer is false because of a buggy Microsoft API, the click never fires and the LLM gets a generic 'not visible' error back. No prompt-engineering on the model side fixes that. The fix has to be in the validation layer where the OS coordinates actually live.

Is the vision path (gemini_computer_use) affected the same way?

Yes, at the tail. The vision loop returns coordinates in 0-999 normalized space, which Terminator converts to screen pixels using window offset, DPI, and resize scale. Once that conversion produces an absolute (x, y) on the virtual desktop, the click is dispatched the same way every selector-driven click is dispatched, through the same validate_clickable() gate. So a Gemini Computer Use turn that picked a coordinate on a secondary monitor was hitting the same false negative. The fix is in the shared validation code path, not in either branch.

Why did this bug exist for so long?

Two reasons. First, IsOffscreen looks like the right method. The Microsoft documentation does not advertise its primary-monitor bias; the method just says 'the element is not on the screen.' Second, the dominant computer-use development setup is a single laptop monitor, often the primary on a docking station. The bug only surfaces when you click into a secondary monitor that is geometrically outside the primary monitor's bounds. The reporter on issue #473 had a wide horizontal layout where the secondary monitor lived at negative x coordinates relative to primary; that is exactly the configuration where IsOffscreen lies most reliably.

Does the fix cover macOS as well?

macOS goes through a different code path. The Windows-specific bug was in crates/terminator/src/platforms/windows/element.rs. The macOS adapter under crates/terminator/src/platforms/macos talks to AXUIElement and uses the system's own multi-display geometry, which has not exhibited the same misreporting. If you are driving macOS apps with an LLM, you do not need this April patch; if you are driving Windows apps, you do.

How do I confirm I am running the fixed version?

Run `terminator --version` if you have the CLI installed, or check `crates/terminator-cli/Cargo.toml` for the workspace version. Anything 0.24.31 or newer has the fix. As of May 2026, the workspace version is 0.24.32. If you are pulling from npm, `npx -y terminator-mcp-agent@latest` will resolve to the most recent published agent. The relevant comment block ('This replaces the old is_offscreen() check which incorrectly returned true for elements on secondary monitors') is the in-source signature of the patch.

Does this fix change the API surface that an LLM sees?

No. The 31 MCP tools (click_element, type_into_element, get_window_tree, and the rest) accept the same arguments and return the same shape. The fix is entirely below the MCP boundary. From the LLM's point of view, the only observable difference is that clicks on secondary monitors now succeed instead of returning ElementNotVisible. That is the kind of fix that does not need a prompt change in any agent that uses Terminator.

Other places the LLM-to-desktop bridge has interesting seams

Adjacent reading

Adjacent

Open source desktop automation projects, April 2026

How four eras of open source desktop automation map onto what an AI coding assistant can actually drive in 2026, and the 753-line selector parser that anchors the newest era.

Read

Companion

Open source computer use agents, April 2026

The four-step coordinate transform that turns a Gemini Computer Use 0-999 click into a real desktop pixel, in public Rust.

Read

Setup

Run vLLM locally with a desktop agent

One environment variable swaps Mediar's hosted Gemini backend for localhost. The contract every self-hosted LLM has to honor.

Read

LLM desktop automation in April 2026: every model release, plus the one-line replacement that decides whether their clicks actually land

The April 2026 model wave, briefly

Microsoft Fara-7B (April 19)

Gemma 4 family

Manus updates

Claude Mythos preview

Computer use on the long tail

The headline OS-side fact: IsOffscreen lies on a second monitor

The bug, drawn

What changed in validate_clickable()

validate_clickable() before and after PR #473

Four steps inside the new helper

The four-step helper at line 316 of element.rs

What now actually works

Why every April 2026 LLM benefits, not just one

Models and harnesses that inherit the fix automatically

Why most April 2026 writeups missed this

Want to see your LLM agent click on a second monitor live?

Frequently asked questions

Adjacent reading

Open source desktop automation projects, April 2026

Open source computer use agents, April 2026

Run vLLM locally with a desktop agent

The headline OS-side fact: `IsOffscreen` lies on a second monitor

What changed in `validate_clickable()`