Guide / Opus 4.7

Claude Opus 4.7 desktop automation

Opus 4.7 shipped on April 16, 2026 with two changes that quietly rewrite how a desktop agent should be built. The first is on the screenshot side: image inputs now go up to 2576 pixels on the long edge (about 3.75 megapixels), and the coordinates the model returns are 1:1 with actual pixels, no scale-factor math. The second is on the planning side: the model makes fewer tool calls per turn by default, leaning on reasoning over rapid action. Those two changes pull in opposite directions if you are stuck in a per-click screenshot loop. They line up perfectly if your tools are structural and your workflow is compiled.

Matthew Diakonov, Written with AI

Published May 8, 20268 min read

Direct answer (verified 2026-05-08)

Two paths to drive a desktop with Opus 4.7. Path A: Anthropic's built-in computer tool. Opus 4.7 sees a screenshot at up to 2576px and emits left_click or type actions; you implement the actual screenshot capture and OS click. Path B: Terminator's MCP server, installed once with claude mcp add terminator "npx -y terminator-mcp-agent@latest". Opus 4.7 then has 35 typed tools that hit the OS accessibility tree directly, no screenshot in the loop. Path B is the better fit for Opus 4.7's "fewer tool calls, more reasoning" default because execute_sequence lets the model ship a whole workflow as one tool call.

Source for the model facts: platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7. Source for the 35 tools: crates/terminator-mcp-agent/src/server.rs.

What actually changed in Opus 4.7

Anthropic's release notes for 4.7 highlight coding and vision wins. For anyone building a desktop automation agent, three of the changes matter more than the rest, and they interact.

Opus 4.7 changes that move desktop automation

Image input ceiling rises to 2576px on the long edge, roughly 3.75MP, up from 1568px / 1.15MP on Opus 4.6.
Coordinates returned by the computer-use tool are 1:1 with actual pixels. No scale-factor math required.
Fewer tool calls per turn at the default effort level. The model reasons more before acting.
New xhigh effort level between high and max. Anthropic recommends xhigh for agentic and coding work.
1M token context window. 128k max output tokens. Adaptive thinking. Same platform features as Opus 4.6.

The pixel-side improvements (2576px input, 1:1 coordinates) make screenshot-driven clicks finally tractable. A 1080p screenshot fits without aggressive downscaling, and you no longer translate model coordinates back to your actual screen. So if you choose Path A, the screenshot loop is now smoother than it has ever been on a Claude model.

But the planning-side change is the one that flips the strategy. Lower default tool-call frequency means Opus 4.7, left to its own taste, will not happily emit thirty left_click events in sequence. It would rather think once and act once. If your tool surface is just screenshot and click(x,y), that preference works against you. If your tool surface includes a single tool that accepts a typed multi-step workflow, that preference works for you.

Path A vs Path B, in code

Both paths are real and supported. The shape of what Opus 4.7 emits is what differs.

Same click, two surfaces

// Opus 4.7 with Anthropic's built-in computer-use tool.
// Model sees a screenshot, picks pixels, returns this JSON.
// You implement the screenshot capture and the click.
// Opus 4.7 specific: image input ceiling is 2576px / 3.75MP
// (4.6 was 1568px / 1.15MP) and coordinates are 1:1 with
// actual pixels, no scale-factor math.

{
  "type": "tool_use",
  "name": "computer",
  "input": {
    "action": "left_click",
    "coordinate": [487, 341]
  }
}

// Cost shape: one model inference per click.
// Latency shape: roundtrip the screenshot every step.
// Failure shape: a tooltip or modal moves the pixel,
// the next click misses, you replay from scratch.

10% fewer lines

Path A asks the model to be a vision system. Every click costs one screenshot upload and one round-trip. The model has to decide where the button is in pixels, every time the layout shifts. Path B asks the model to be a planner. The selector role:Button && name:Save is resolved locally by the MCP server against the live UIA tree on Windows or AX tree on macOS, in microseconds, without the model in the loop.

Wire it up in one command

Terminator publishes its MCP server on npm as terminator-mcp-agent. The Claude Code, Cursor, and Windsurf clients all support the same registration command.

Terminator MCP install

After registration, Opus 4.7 sees the 35 tools as ordinary tool definitions. There is no special prompting required. The model picks the tool whose schema matches the task; the server resolves selectors and runs the action; the result comes back as structured JSON that Opus reads on its next turn.

The execute_sequence shape

Of the 35 tools in the MCP server, one is the reason Opus 4.7's "fewer tool calls" default becomes an advantage rather than a problem. execute_sequence, defined at crates/terminator-mcp-agent/src/server.rs:7549, accepts a whole typed workflow inside a single tool call: variables, named selectors, an ordered list of steps, retries per step, fallback branches via fallback_id, conditional jumps based on prior step status, and an optional output parser written as JavaScript.

// One MCP call. The whole workflow inside.
// crates/terminator-mcp-agent/src/server.rs:7549

{
  "name": "execute_sequence",
  "input": {
    "variables": {
      "report_path": { "type": "string", "default": "report.xlsx" }
    },
    "selectors": {
      "calc_window": "role:Window && name:Calculator",
      "btn_equals":  "role:Button && name:Equals"
    },
    "steps": [
      { "tool_name": "open_application",
        "arguments": { "path": "calc.exe" }, "id": "launch" },
      { "tool_name": "type_into_element",
        "arguments": { "selector": "${{selectors.calc_window}}",
                       "text_to_type": "42" },
        "retries": 2,
        "fallback_id": "recover_focus" },
      { "tool_name": "click_element",
        "arguments": { "selector": "${{selectors.btn_equals}}" },
        "jumps": [
          { "if": "click_element_status == 'success'",
            "to_id": "capture" }
        ]},
      { "tool_name": "wait_for_element", "id": "capture",
        "arguments": { "selector": "${{selectors.calc_window}}",
                       "condition": "exists",
                       "include_tree": true } }
    ],
    "troubleshooting": [
      { "tool_name": "activate_element", "id": "recover_focus",
        "arguments": { "selector": "${{selectors.calc_window}}" } }
    ]
  }
}

// Opus 4.7 emits this once.
// The MCP server runs every step locally with no model in
// the inner loop. If a step fails, the troubleshooting
// branch fires. Output is structured JSON.

One model inference. The server walks the steps. If type_into_element fails twice, the engine jumps to recover_focus in the troubleshooting block and retries. The model only re-enters the loop when the whole sequence finishes or hits an unrecoverable state.

What that looks like as a sequence

The contrast with the per-click loop is sharpest when you draw it.

One workflow, one model turn

When to still use the screenshot path

Path B is not a moral position. There are real cases where Path A wins. Custom-rendered Electron surfaces, canvas-heavy editors, and games expose almost nothing useful through accessibility APIs; their entire UI is a single opaque element with no labels. Opus 4.7's higher-resolution input and 1:1 coordinates are exactly what you want there. Terminator includes capture_screenshot as one of its 35 tools precisely so Opus 4.7 can fall back to vision when the tree is empty.

The healthy split: use validate_element to check whether the accessibility tree exposes what you need. If yes, structural tools. If no, screenshot plus 1:1 click. Opus 4.7 is good enough at routing this decision that you can leave it to the model rather than hardcoding the split.

Numbers that fit on one row

0typed MCP tools in server.rs

0model turn for a multi-step workflow via execute_sequence

0pxpx screenshot input ceiling on Opus 4.7

0%percent pixel-to-coordinate mapping in computer-use

A practical recipe

If you are starting today, the configuration that gets the most out of Opus 4.7 looks like this. Run the model at xhigh effort for the agentic outer loop, since Anthropic explicitly recommends xhigh for coding and agent work and you want the model to reason hard before dispatching a workflow. Register Terminator's MCP server as the single source of desktop tools. Lean on execute_sequence for any task that has more than two structural steps; reserve direct per-tool calls for short interactive sessions and recovery paths. Keep the computer tool available as an escape hatch for surfaces with no accessibility metadata.

The mental model: Opus 4.7 is the planner; Terminator is the operator. The model reasons, compiles a workflow, and steps back. The MCP server runs the workflow. Failures bounce back to the model only when the troubleshooting branch cannot recover. That is the agent shape Opus 4.7's defaults were tuned for.

Pairing Opus 4.7 with desktop automation in production?

Bring your workflow. We will sketch the execute_sequence shape, the fallback branches, and where the screenshot escape hatch belongs.

FAQ

Frequently asked questions

How do I use Claude Opus 4.7 for desktop automation?

Two paths. Anthropic exposes a built-in computer-use tool: Opus 4.7 sees a screenshot (now up to 2576px / 3.75MP, with 1:1 pixel-to-coordinate mapping) and returns click or type actions that your code executes. Or wire Terminator's MCP server with `claude mcp add terminator "npx -y terminator-mcp-agent@latest"` so Opus 4.7 calls 35 typed accessibility-tree tools instead of pixel coordinates. The MCP path resolves selectors locally against Windows UI Automation or macOS Accessibility, no screenshot in the loop.

Does Opus 4.7 actually improve over Opus 4.6 for desktop work?

For pixel-driven computer use, yes, in two specific ways. The image input ceiling rose from 1568px to 2576px on the long edge (about 3.75MP), so a Full HD or 4K screenshot fits without aggressive downscaling. Coordinates the model emits are 1:1 with the actual pixels you sent, so there is no scale-factor math. Anthropic also reduced default tool-call frequency on 4.7, which means the model leans on reasoning over rapid-fire actions. For agentic flows, run at high or xhigh effort.

What is the xhigh effort level and when should I use it?

xhigh is a new effort level Opus 4.7 introduced between high and max. Anthropic's docs recommend it for coding and agentic use cases because the model spends more time reasoning before each action, which compensates for the lower default tool-call rate. For a desktop automation agent that has to navigate unfamiliar applications, xhigh tends to produce fewer wasted clicks at the cost of higher per-turn latency.

Why does Terminator give 35 tools instead of just one click(x,y) tool?

Because clicks are one-tenth of what an automation actually needs. The 35 tools at crates/terminator-mcp-agent/src/server.rs cover get_window_tree, click_element, type_into_element, press_key, validate_element, wait_for_element, scroll_element, select_option, set_value, capture_screenshot, run_command, navigate_browser, execute_browser_script, execute_sequence, and the file primitives read_file / write_file / edit_file / glob_files / grep_files. Each one wraps a real OS or browser primitive. A click(x,y) tool collapses all that into pixel guessing and forces the model back into a screenshot loop.

What is execute_sequence and why is it the right shape for Opus 4.7?

execute_sequence is one MCP tool that accepts a typed workflow: variables, named selectors, an array of steps, fallback branches, conditional jumps, and an optional output parser. The model emits the whole workflow once. The server runs every step locally with no model in the inner loop. Because Opus 4.7 defaults to fewer tool calls per turn, it is naturally inclined to think harder up front and dispatch a bigger unit of work. execute_sequence is the bigger unit. The shape matches.

Can I mix the screenshot path and the accessibility path?

Yes, and you usually want to. Terminator exposes capture_screenshot as one of its tools, so Opus 4.7 can fall back to vision when the accessibility tree is missing labels (common in custom-rendered Electron and game UIs). The healthy split is: structural tools for everything the OS knows the name of, screenshot plus Opus 4.7's 1:1 coordinates for the rest. Use validate_element first to decide which path to take.

What platforms does this work on?

Windows is the primary platform with full feature support via the UI Automation COM API. macOS works at the core Rust level via the Accessibility API and requires you to grant accessibility permissions in System Settings. The terminator-mcp-agent npm package ships Windows binaries and macOS works through the Rust crate. Linux uses AT-SPI2 in the core but is not yet packaged as an MCP binary.

Keep reading

Guide

Claude desktop automation: one MCP call that runs the whole workflow

execute_sequence in detail. Variables, selectors, jumps, fallback branches, JS output parser. server.rs:7549.

Read

Guide

Claude computer use: the pixel loop, and the selector-based alternative

Why Anthropic's native computer-use tool is a screenshot loop, and what the accessibility-tree alternative looks like.

Read

Guide

Claude skills for desktop automation: the two that ship with Terminator

terminator-issue-reporter and remote-mcp. Two skills under .claude/skills that pair Anthropic skill markdown with the MCP server.

Read