Browser MCP to desktop automation: you don't extend the server, you replace its dispatch root
Every browser MCP (Playwright MCP, Chrome DevTools MCP, Browser MCP, browser-use, Browserbase) binds its tool dispatch to a single Page or Browser handle. That is why your agent goes blind the moment the workflow leaves the tab. The structural fix is not a second MCP. It is one MCP whose dispatch root is the OS accessibility tree, with the browser sitting as one subset.
Direct answer (verified 2026-05-06)
You don't extend a browser MCP. A browser MCP's dispatch table is bound to a Page object: every tool is a method on that page, so OS-level work (Save dialogs, native apps, system shortcuts, shell commands) is structurally out of scope. To control the desktop, you swap to an MCP whose dispatch is bound to the OS accessibility tree, with the browser as one subset reached through an extension. Terminator does this in one Rust match block at crates/terminator-mcp-agent/src/server.rs:9953, where navigate_browser and open_application are sibling arms.
Source: github.com/microsoft/playwright-mcp (browser-only), github.com/mediar-ai/terminator (whole-OS).
The boundary is structural, not a missing feature
Read the source of any browser MCP and you'll find the same shape. Microsoft's playwright-mcp wraps a Playwright Browser; Chrome DevTools MCP wraps a CDP session and exposes 26 tools in 6 categories; browsermcp.io wraps a local Chrome through an extension; browser-use wraps a Playwright browser context; Browserbase wraps a remote browser. The tool registry is whatever methods the underlying object exposes. That is what makes them composable and easy to ship; it is also what makes them unable to leave the tab.
The Save dialog is not in the page DOM. The OAuth code in your desktop authenticator is not in the page DOM. Excel is not in the page DOM. The shell is not in the page DOM. None of those are reachable through a Page handle, so none of them are reachable through a browser MCP. You can bolt on a second MCP server, but two registries do not interleave inside one execute_sequence call, and the LLM has to plan twice.
Browser MCP vs desktop MCP, viewed as dispatch tables
Browser MCP server (Playwright MCP, Chrome DevTools MCP, Browser MCP, browser-use, Browserbase): the tool dispatch table is bound to one Page or Browser object. Every tool is a method on that object. The agent can do anything the page can do, and nothing else. Save dialog, native authenticator, Excel paste, OS hotkey, shell command: all out of scope. The agent goes blind at the tab boundary.
- Tools are methods on a Page or Browser object
- Every tool's reach equals that page's reach
- Save dialog, native apps, OS shortcuts: out of scope
- Two MCPs to mix surfaces, two registries, two plans
The match block is the whole story
The fastest way to see what makes a desktop MCP different from a browser MCP is to read the dispatch function. In Terminator's server.rs, the dispatch_tool method is one Rust match tool_name at line 9953. Every tool the agent can call is one arm of that match.
// crates/terminator-mcp-agent/src/server.rs line 9953
// One Rust match block. The LLM picks any of these arms in one session.
let result = match tool_name {
// Browser surface (overlaps with playwright-mcp / chrome-devtools-mcp)
"navigate_browser" => self.navigate_browser(...).await,
"execute_browser_script" => self.execute_browser_script(...).await,
// Native OS surface (no browser MCP has these arms)
"open_application" => self.open_application(...).await,
"get_applications_and_windows_list" => self.get_apps(...).await,
"get_window_tree" => self.get_window_tree(...).await,
"click_element" => self.click_element(...).await,
"type_into_element" => self.type_into_element(...).await,
"press_key_global" => self.press_key_global(...).await,
"mouse_drag" => self.mouse_drag(...).await,
"run_command" => self.run_command(...).await,
// Vision fallback in the same dispatch block
"gemini_computer_use" => self.gemini_computer_use(...).await,
// Sequence runner that mixes any of the above
"execute_sequence" => self.execute_sequence(...).await,
// ... around two dozen more arms (file ops, validation,
// highlighting, scrolling, screenshots, etc.) ...
_ => Err(McpError::internal_error("Unknown tool", ...)),
};Around three dozen #[tool(...)] declarations live in server.rs; they all dispatch through that one match. The browser arms (navigate_browser, execute_browser_script) overlap with what a browser MCP gives you. The native arms (open_application, click_element, type_into_element, press_key_global, run_command) cannot exist in a browser MCP because their dispatch is bound to a Page object, not the OS.
“#[tool(...)] declarations in server.rs, all routed by one match block at line 9953. navigate_browser and open_application are sibling arms.”
terminator/crates/terminator-mcp-agent/src/server.rs
What a unified dispatch looks like in flight
A real workflow that crosses the boundary: scrape a page, paste into Excel, save with a native dialog. The agent emits ten tool calls, the MCP server dispatches each one through the same match, and the browser extension and the OS accessibility tree do the work in their own lanes.
One execute_sequence, two surfaces
The MCP server is the only process the LLM talks to. Whether the next call lands on the browser extension at ws://127.0.0.1:17373 or on the Windows UIAutomation tree is decided inside the match. The agent does not know and does not have to.
What the YAML looks like when both surfaces are alive
The same shape as a Playwright MCP test, but with two extra kinds of step. Try this in Playwright MCP and the run halts at open_application: the tool is not in the registry. Try it in Terminator and every step routes through the same dispatch.
# crates/terminator-mcp-agent/examples/mixed.yml
# One MCP, one execute_sequence, two surfaces. The dispatch root is
# the OS; the browser is one of the arms.
steps:
# Browser surface (would also work in Playwright MCP)
- tool_name: navigate_browser
arguments: { url: "<your internal orders URL>" }
- id: rows
tool_name: execute_browser_script
arguments:
script: |
return Array.from(document.querySelectorAll("tr.order"))
.map(r => [r.dataset.id, r.querySelector(".total").innerText].join("\t"))
.join("\n");
# Native surface (no browser MCP can reach this)
- tool_name: open_application
arguments: { path: "excel.exe" }
- tool_name: type_into_element
arguments:
selector: "role:Window && name:Book1 - Excel"
text_to_type: "${{rows_result}}"
- tool_name: press_key_global
arguments: { keys: "Ctrl+S" }
# Native dialog, filled by selector
- tool_name: type_into_element
arguments:
selector: "role:Edit && window:Save As && name:File name:"
text_to_type: "q4-orders.xlsx"
- tool_name: click_element
arguments:
selector: "role:Button && window:Save As && name:Save"
stop_on_error: trueThe honest scope of each kind of MCP
What every browser MCP covers
- Click and type inside a page DOM
- Navigate, reload, screenshot a tab
- Evaluate JavaScript in a page context
- Inspect network and console (CDP)
- Manage tabs, dialogs (in-page)
What a desktop MCP adds on top
- open_application + get_window_tree + click_element on any UIA-accessible app
- press_key_global for Ctrl+S, Win+R, Cmd+Tab against the OS focus, not the tab
- mouse_drag with screen coordinates across windows
- run_command into bash, cmd, powershell, node, python from the same MCP
- Native file save / open dialogs filled by selector, not coordinates
- execute_sequence YAML interleaves navigate_browser, open_application, run_command in one call
If your agent will only ever live inside a tab, a browser MCP is the right tool and you do not need any of this. Playwright MCP is mature, Chrome DevTools MCP is the right pick for inspecting and debugging a real browser, Browser MCP is good for fully local Chrome control, browser-use for self-hosted agent loops. The argument here is only relevant if your workflow leaves the tab even once.
Install and verify the unified dispatch
The two tells that the dispatch root is the OS, not a tab: navigate_browser appears next to open_application in claude mcp list, and execute_sequence accepts a YAML that mixes both without a custom adapter.
# 1. Install Terminator's MCP server (Rust binary, behind npx)
claude mcp add terminator "npx -y terminator-mcp-agent@latest" -s user
# 2. Load the Chrome extension that gives the MCP server an eval channel
# open chrome://extensions, toggle Developer Mode, "Load unpacked",
# pick: terminator/crates/terminator/browser-extension/
# (Manifest V3, name "Terminator Bridge", version 0.24.32)
# 3. Verify the dispatch is unified
claude mcp list
# terminator stdio ~3 dozen tools
# Includes navigate_browser AND open_application AND run_command
# in the same registry.When to use this, and when not to
Use a desktop MCP when the agent has to cross from the page into a native window in a single workflow: download then open in Acrobat, scrape then paste into Excel, complete an OAuth flow that lands in a desktop authenticator, fill a system Save dialog, run a shell command between two browser steps. Use a browser MCP when the workflow stays in the tab, because the smaller registry is cheaper to plan against and the install is one binary.
The mistake is to bolt on a second MCP and assume the model will pick the right one. It will, sometimes; the failures show up at the boundary, where the model has to context-switch between two selector grammars and two state spaces, and they cost more than the migration to a unified dispatch would have.
Have a workflow that keeps tripping over the tab boundary?
Show us where your browser MCP gives up and we'll walk through where Terminator picks up. 30 minutes, no slides.
Browser MCP to desktop FAQ
What is a browser MCP server, exactly, and why does the boundary matter?
A browser MCP server is an MCP process that exposes tools like navigate, click, fill, screenshot, and execute_script, all of which take a CSS or aria selector and dispatch through a single Browser or Page handle (Playwright Browser, a CDP session, a Browserbase session). The shape of the tool registry mirrors the shape of the underlying object. Browser MCP can do anything that object can do. It cannot do anything that object cannot do. The OS save dialog, the OAuth code in a desktop authenticator, the legacy Win32 line-of-business app, the Excel paste, the run_command into a shell, none of those are reachable through a Page handle, so they are not reachable through a browser MCP either. The boundary is structural, not a missing feature.
What does it mean to replace the dispatch root instead of extending the server?
Most people who hit the boundary try to bolt on a second MCP server (a desktop tool, a shell tool, a custom subprocess) and let the LLM decide which to call. That works, sort of, until you need to mix steps in one workflow: scrape a row out of the page, paste it into Excel, and hit Ctrl-S, all inside one execute_sequence so the model only has to plan once. Two MCPs cannot share state cleanly; two selector grammars do not interleave. The structural fix is to use one MCP whose dispatch table is rooted at the OS, not at a tab. Terminator does this. In crates/terminator-mcp-agent/src/server.rs starting at line 9953, the dispatch_tool method is one Rust match block where navigate_browser, execute_browser_script, open_application, click_element, type_into_element, press_key_global, and run_command are all sibling arms.
How does Terminator reach into the browser if its dispatch root is the OS?
It ships a Manifest V3 Chrome extension named Terminator Bridge (manifest at crates/terminator/browser-extension/manifest.json, version 0.24.32) which holds a WebSocket connection to the local MCP server on ws://127.0.0.1:17373 (worker.js line 1, port verified at main.rs:240). The MCP server's execute_browser_script tool sends an eval frame down that socket, the extension uses chrome.scripting + chrome.debugger to run the code in the active tab, and returns the result through the same socket. From the agent's point of view it is just another tool in the same registry. The selector for a click in the page (capture_browser_dom_elements + click) and the selector for a click in a native window (role:Button name:Save) flow through the same dispatch.
Why is the unified selector grammar more important than a unified set of tools?
Because the workflow recorder, the YAML format, the LLM prompt, and the failure modes are all shaped by the selector. Two tools that take two different selector formats are not really one tool surface, even if they live behind the same MCP server. Terminator's selector format (role:Button name:Save window:Save As, or role:Document for the focused tab) works for both surfaces. type_into_element with role:Edit window:Save As fills the file save dialog the same way it fills a DOM input. The LLM does not have to remember which tool to pick when the workflow crosses the boundary; the selector already tells the dispatch where the element lives.
Is this just "computer use" with extra steps?
No, and the difference is important on a slow network. Computer use models (Claude's, Gemini's, OpenAI's) operate on screenshots and emit (x, y) clicks. They are pixel-bound, expensive, and slow. Terminator dispatches by selector against the accessibility tree, which is structural and fast: get_window_tree returns a JSON tree of every named control in a window, the agent picks one by name, and the click lands on the element regardless of resolution, scaling, or theme. The gemini_computer_use tool does exist in the same dispatch block (server.rs has it as one arm) for cases where vision is the right tool, like canvases or PDFs. The point is that selector-based dispatch is the default, and screenshot dispatch is a fallback, not the other way around.
Concretely, what tools does Terminator add that a browser MCP does not have?
Compared with a typical browser MCP (whose tool list is roughly: navigate, click, fill, select, hover, press, screenshot, evaluate, console, network, plus tab-management), Terminator's match block adds open_application, get_applications_and_windows_list, get_window_tree, click_element / type_into_element / set_value / select_option / set_selected with cross-app selectors, press_key_global (Ctrl+S, Win+R, Cmd+Tab against the active OS focus, not the active tab), mouse_drag with screen coordinates, run_command into bash/cmd/powershell/node/python, capture_screenshot at the screen level, gemini_computer_use as a vision fallback, and a file-system block (read_file / write_file / edit_file / glob_files / grep_files / copy_content). The browser tools (navigate_browser, execute_browser_script) are still there, just as siblings.
Will this work on macOS too, or is it Windows-only?
Right now the Node.js, Python, and MCP server packages are Windows-first. Windows UIAutomation is the primary surface and the one with the most depth in tests. macOS support existed in the Rust core for a stretch and was removed on 2025-12-16 (commit 0c11011c) to focus on the Windows path. The browser-extension half of the bridge still works on either OS because it is a Chrome MV3 extension talking to a local WebSocket; the native-app half does not work on macOS in Terminator today. If you are on a Mac and you need both halves, the answer is to run the agent against a Windows VM through the headless mode (TERMINATOR_HEADLESS=true uses a virtual display so UIA works without RDP).
How do I install and verify the unified dispatch in under a minute?
One MCP install line, then load the extension. Run claude mcp add terminator "npx -y terminator-mcp-agent@latest" -s user (or wire it into Cursor / VS Code / Windsurf via the same name). Open chrome://extensions, toggle Developer Mode, click Load unpacked, point at the browser-extension folder inside the terminator-mcp-agent npm package or the GitHub repo (path: crates/terminator/browser-extension), and confirm "Terminator Bridge" v0.24.32 loads. claude mcp list should now show terminator with around three dozen tools. The two tells that the dispatch is unified: navigate_browser shows up next to open_application in the tool list, and execute_sequence accepts a YAML that mixes both without a custom adapter.
What if my agent only needs the browser, ever?
Then a browser MCP is the right answer and you should not migrate. Playwright MCP is excellent for tab-bound work, Chrome DevTools MCP is excellent for inspecting and debugging a real browser, Browser MCP is excellent for fully local browser control through an extension, browser-use is excellent if you want a self-hosted agent loop. Terminator overlaps with all of them on the browser surface but its reason to exist is the OS surface. Use the right tool for the job; the unified dispatch only matters if the job leaves the tab.
Related guides
Playwright MCP server, then off the page
Same MCP shape, broader scope: how Terminator extends Playwright MCP into native windows.
When browser agents leave the DOM
What breaks the moment a workflow crosses out of the DOM, and how a unified selector grammar fixes it.
Terminator MCP
The MCP server itself: tool list, install, examples.