GuideNotion desktop appWindows UI AutomationMCP over stdio

Notion MCP server, but for the desktop app

Search results for "notion mcp server" point to one thing: an MCP wrapper around Notion's REST API. Official hosted version, community ports, Docker catalog entries, PulseMCP, Portkey. All the same shape. All useful. All missing a category. This page covers the other kind of Notion MCP server, the one that controls the Notion desktop client directly through Windows UI Automation, with selectors like window:Notion >> role:Button && name:Share. No OAuth. No API rate limits. Access to UI the REST API never exposes.

Matthew Diakonov, Written with AI

Published April 19, 202611 min read

4.9from Open-source, MIT

Drives the Notion desktop client, not the REST API

31 MCP tools from one dispatch_tool match block in server.rs

Selectors: role, name, nativeid, plus 9 chainable combinators

Runs under Claude Code, Cursor, VS Code over stdio JSON-RPC

Read the source on GitHub Jump to the selector engine

Two Notion MCP servers, one protocol

One drives the API. The other drives the desktop app.

The official Notion MCP wraps the REST API

Terminator's MCP wraps the Notion desktop client

Selectors target UI: role:Button && name:Share

get_window_tree returns the live accessibility tree

One LLM can use both servers in the same turn

0:00 / 0:05

The short answer

There are two kinds of Notion MCP server, and nearly every article only covers one. The first kind wraps Notion's cloud REST API: tools like retrieve_page, query_database, update_block. That is the default when someone says "Notion MCP".

The second kind wraps the Notion desktop application itself. Same MCP protocol, same JSON-RPC over stdio, same client stack (Claude Code, Cursor, VS Code). Different handlers. Instead of hitting api.notion.com, the server reads the OS accessibility tree, finds UI elements by role and name, clicks them, types into them, and waits for new ones to render. The surface area is every button, input, menu, and view state inside the Notion desktop client, including the parts the REST API cannot reach.

That second kind is what Terminator's MCP server is. It is not Notion-specific. It is a general-purpose desktop MCP server whose 31 tools work against any Windows app, and Notion happens to be a target with well-labeled accessibility metadata. The window:Notion scope at the front of a selector is the only Notion-aware thing in the whole pipeline.

What the desktop MCP server ships with

Four counts that matter for Notion: how many tools the agent exposes, how many selector prefixes the selector engine understands, how many chainable combinators, and the reported success rate against desktop UI (from the Terminator README).

0MCP tools exposed

0Selector prefixes

0Chainable combinators

0+% success rate (README)

One LLM turn, three Notion actions

Four actors, nine messages. The model never touches pixels. It emits tool names and selectors; the MCP server resolves them against the Notion window's accessibility tree.

Create a new page in Notion and share it

The anchor fact: the selector engine that targets Notion UI

Here is the part no REST-API Notion MCP guide talks about, because their server does not have this layer. Terminator's selector engine (source: crates/terminator/src/selector.rs) parses strings like window:Notion >> role:Button && name:Share into a small AST and walks them against the UIA tree. There are 9 prefix types and 9 chainable combinators. Documentation lives in docs/SELECTORS_CHEATSHEET.md in the Terminator repo.

examples/notion-selectors.txt

The selector string is the interface. Every MCP tool that touches a UI element (click_element, type_into_element, wait_for_element, scroll_element, activate_element, highlight_element) takes one. The LLM decides the selector; the server resolves it.

Notion is already a tested target in the Terminator benchmark suite. Open crates/terminator/src/platforms/windows_benchmarks.rs and grep for notion.so. It appears as a browser test case, proving that the same accessibility-tree pipeline used for Notion in a browser also applies to the Notion desktop client window.

The 9 selector prefixes

These are the keys you put before the colon in any selector. The first three handle 80% of Notion cases; nativeid is the escape hatch for stability.

role:name:text:id:nativeid:classname:visible:pos:window:

The 9 combinators

Chain these between prefixes. The positional ones (rightof:, below:, near:) are the hack for Notion controls whose names change or localize.

&&>>||!rightof:leftof:above:below:near:

What `get_window_tree` returns for Notion

Before the LLM can click anything, it needs to know what is on screen. Vision-based agents would screenshot. Terminator walks the Windows UIA tree under the Notion window and returns structured JSON. Every node has a role, a name, bounds, and children. The LLM now has the full shape of the app's UI without looking at a single pixel.

mcp://tools/call get_window_tree(Notion)

One dispatch_tool, one handler per Notion action

The Notion-relevant arms of `dispatch_tool`

Every MCP call lands in one match block. For a Notion workflow, these are the arms you care about. The rest (file I/O, highlighting, meta-operations) stay out of the way.

crates/terminator-mcp-agent/src/server.rs

API MCP server vs desktop MCP server for Notion

Same protocol on the wire, two completely different surfaces. Most teams want both eventually. Here is the tradeoff, feature by feature.

Feature	Official (API)	Terminator (desktop)
What it controls	Notion's REST API endpoints (pages, blocks, databases)	The Notion desktop app's actual UI, via Windows UIA
Auth	OAuth + workspace integration grant	The logged-in desktop client. Whatever the user sees, the agent sees
Rate limits	Notion API rate limit: 3 requests/sec average	OS-level, none. Bottleneck is UI paint and accessibility tree refresh
What you can do	Read/write properties, blocks, comments, database rows	Anything a human can do in the client: drag blocks, use slash menus, navigate sidebar, edit templates, activate UI that the API does not expose
UI-only views	Cannot reach calendar, gallery, timeline view state or keyboard shortcut UI	All of those are just elements in the accessibility tree
Offline state	Depends on Notion's sync state	Reads whatever is currently rendered, even in a partially offline session
Identifying elements	Stable API IDs and property names	Stable: nativeid / AutomationID. Semantic: role + name. Position: rightof/below/near
Cross-app workflows	Must pair with a separate integration per app	Same MCP server drives Notion, Chrome, Slack, Excel, anything on the OS. One dispatch_tool

Anatomy of a desktop-targeted Notion MCP server

Six things that define this server shape. None of them are Notion-specific; they apply to any Windows UI. Notion is the target because that is what the keyword brought you here for.

Two Notion MCP servers, one protocol

The official Notion MCP server wraps the REST API. Terminator's wraps the desktop client via the OS accessibility tree. Same JSON-RPC 2.0 over stdio, totally different surface. An LLM can use both at once: API for structured reads, desktop MCP for UI that the API cannot touch.

get_window_tree returns structure, not pixels

Windows UI Automation already labels every widget with role, name, bounds, and enabled state. Terminator reads that tree and serializes it to JSON. No vision model needed.

Selectors instead of coordinates

role:Button && name:Share is stable across window sizes, themes, and display scaling. Coordinates are not. Chain with >> to walk the tree, || for fallbacks, near: for geometric hints.

execute_sequence nests the other 30 tools

A single MCP call can drive a whole Notion workflow: open the app, wait for the button, click it, type the title, press Enter, share. Expressed in YAML, executed deterministically.

Works with the logged-in client

The agent uses whatever session Notion already has on the user's machine. No OAuth flow, no integration grant, no shared workspace setup. If you can see the page, the agent can act on it.

Falls back to pixels only when needed

Accessibility tree first. When a widget is custom-drawn and the tree is thin (rare in Notion), press_key_global and mouse_drag are there as escape hatches. Most Notion UI is well-labeled.

What happens when the LLM says "create a Notion page"

Six steps from cold start to a shared page. No framework magic, no hand-waving. Each step is a specific MCP tool call.

Make sure Notion is running

The LLM calls get_applications_and_windows_list. Terminator returns process and window titles. If Notion is not open, it calls open_application with path "Notion".

Read the current UI

get_window_tree("Notion") returns a JSON tree of the live UI: sidebar pane, editor pane, top bar, every button, every block. The LLM now has the structure it needs to reason about what to click.

Pick a selector

Instead of coordinates, the model emits role + name selectors: window:Notion >> role:Button && name:Share. These match on accessibility attributes, so they survive resize, theme changes, and minor UI updates.

dispatch_tool matches click_element

The MCP server's match block in server.rs routes to the click_element handler. It resolves the selector against the UIA tree and calls invoke() on the element. No mouse movement required.

type_into_element writes the title

Title fields in Notion expose as role:Edit with a name. type_into_element sends characters through the accessibility API. It does not take over the physical keyboard.

Verify with another get_window_tree

The model can re-read the tree to confirm the new page appeared in the sidebar, the title saved, or the Share dialog rendered. Then it proceeds to the next step in execute_sequence.

The whole flow as one MCP call

Terminator's execute_sequence is a workflow DSL inside MCP. You emit a YAML list of steps, each one referencing another tool, and the server runs them in order. The LLM pays latency once, not per click.

workflows/notion-create-and-share.yaml

Try it yourself in two minutes

This is the actual install command from the Terminator README, followed by a live Claude Code session driving Notion. With Notion already open on Windows, the LLM chains five MCP calls to create and share a new page.

Claude Code + Terminator MCP

What an LLM can do inside Notion with this MCP server

This is not an exhaustive list; it is the common set for typical workflows. Anything the desktop client can render, the server can target.

Launch the Notion desktop client from a command
Read the accessibility tree of the live Notion window
Click by role and name, not by pixel coordinates
Type into any titled input (title, block, comment, search)
Scroll long sidebars and long pages programmatically
Wait for new UI to appear (new block, share dialog, template picker)
Chain multiple actions in a single YAML workflow
Take before/after screenshots without switching focus
Run inside Claude Code, Cursor, VS Code over stdio

MIT

“Every file path, tool name, and selector on this page is grep-able in a fresh clone of mediar-ai/terminator. No invented specs.”

github.com/mediar-ai/terminator

A sample of Notion UI actions driven through MCP

Each pill below is a real Notion interaction an LLM can trigger with a single click_element, type_into_element, or short execute_sequence call. None of these require the Notion REST API.

Create a new pageOpen a specific database viewApply a filter in a table viewDrag a block to a new positionUse a slash command for a templateShare a page with a specific emailChange a property in a rowNavigate the sidebar treeDuplicate a pageAdd a comment on a blockSwitch workspaceExport a page to PDF

Same server, any app

Notion is one target. The same dispatch_tool match block drives every app whose window is on the OS. Cross-app workflows (pull rows from Excel, paste into a Notion database, ping the team in Slack) stay inside one MCP namespace.

Claude CodeCursorVS CodeWindsurfNotion desktopChromeExcelSlack desktopFigma desktopOutlook

Why this matters if you are already using the API MCP

The REST API is fast at structured reads and bulk writes. It is slow, or impossible, for anything UI-only: template pickers, sidebar drags, view switches, comment reactions, gallery/calendar state, slash-command flows. An LLM that has both servers connected can pick the cheaper path per step. Most automations are mixed.

Terminator is a developer framework for building that kind of desktop automation. It is not a consumer app. It gives existing AI coding assistants the ability to control your entire OS, not just write code. Like Playwright, but for every app on your desktop, Notion included.

If you came here looking for the hosted Notion MCP, use it. Then add this one alongside for the 20% of work it cannot do.

MCP tools available the moment the server starts

One per match arm in dispatch_tool. All available against the Notion desktop window the moment you install the MCP agent.

Need a Notion workflow that the REST API cannot do?

Show us the UI path and we will map it to a deterministic MCP workflow you can run from any LLM.

Frequently asked questions

What does "Notion MCP server" usually mean, and how is Terminator different?

Usually it means Notion's official hosted MCP server or one of the community wrappers around Notion's REST API (makenotion/notion-mcp-server, suekou/mcp-notion-server, Portkey's hosted version, Docker's MCP catalog entry). Those all let an LLM call Notion's cloud endpoints to search pages, read blocks, update databases, and create content. Terminator is a different kind of MCP server. It does not talk to Notion's cloud at all. It controls the Notion desktop client that is running on your machine, using Windows UI Automation. Instead of tools like retrieve_page or query_database, it exposes tools like click_element, type_into_element, get_window_tree, and execute_sequence. The LLM targets UI elements with selectors like window:Notion >> role:Button && name:Share. The two approaches are complementary: use the API server for structured reads and bulk writes; use the desktop server for UI actions that the API does not expose, or for anything that needs to happen inside the app the user is already signed into.

Why would I want to drive the Notion desktop app instead of hitting the API?

Four reasons, in order of how often they come up. First, the REST API does not cover every action. Keyboard-shortcut-only UI, sidebar reordering, template pickers, some view switches, and various in-app dialogs are only reachable through the UI. Second, no OAuth. The agent uses whatever session the Notion desktop app is already logged into. No integration grant, no per-workspace setup. Third, no rate limit. The Notion API is 3 requests per second average; UI Automation has no such limit, so bulk UI operations are bounded by how fast the app repaints. Fourth, cross-app workflows. The same MCP server that drives Notion also drives Chrome, Excel, Slack, Terminal, and any other app on the machine, from one dispatch_tool match block. If your automation spans Notion plus three other apps, you stay in one tool namespace instead of gluing four integrations together.

How do I target a specific element inside Notion?

With a selector string. Terminator uses a prefix:value format with chainable combinators. Prefixes include role: (accessibility role like Button, Edit, Link), name: (accessible label, case-insensitive), text: (visible text, case-sensitive), id: (accessibility ID), nativeid: (Windows AutomationID or macOS AXIdentifier), classname:, visible:, pos:, and window: as a scope anchor. Combinators: && for AND, >> for descendant traversal, || for OR, ! for NOT, plus positional ones like rightof:, leftof:, above:, below:, near:, and .. to walk to the parent. A typical Notion selector looks like window:Notion >> role:Button && name:Share. That says "inside the window titled Notion, find a Button whose accessible name is Share." The same pattern works on every other Windows app, which is why Terminator calls itself Playwright-shaped for the whole desktop.

What is the anchor fact for this page, the thing no other Notion MCP guide covers?

That the Notion desktop app is itself an MCP target, not via the Notion API but via Windows UI Automation. Terminator's MCP server exposes 31 tools from a single dispatch_tool match block in crates/terminator-mcp-agent/src/server.rs (line 9953). The relevant tools for Notion are get_window_tree (walk the app's accessibility tree and return JSON), click_element (invoke a button by selector), type_into_element (send characters through AX), wait_for_element (poll until something new renders), press_key (keyboard shortcuts), scroll_element (long sidebar, long page), and execute_sequence (YAML workflow that nests all of the above). The Notion homepage is already a test target in Terminator's Windows benchmark suite at crates/terminator/src/platforms/windows_benchmarks.rs. You can verify every tool and selector on this page by cloning mediar-ai/terminator and grepping the source.

Does this work on macOS or Linux?

Core automation is Windows-only today. The Terminator framework has macOS Accessibility API and Linux AT-SPI2 adapters in the tree (the selector engine, the MCP server shell, and the SDK bindings are cross-platform), but the stable production target is Windows, where UI Automation is the most complete. Notion's desktop client runs on Windows and macOS; for Windows, Terminator's MCP server is the full experience. On macOS, use the official Notion MCP server (API-backed) and keep an eye on Terminator's macOS progress, which is where cross-app UI automation via the AX API is headed.

How do I install Terminator's MCP server so Claude or Cursor can drive Notion?

One line: claude mcp add terminator "npx -y terminator-mcp-agent@latest" -s user. That registers the stdio server under Claude Code at user scope. For Cursor and VS Code, add a JSON block with command: npx and args: ["-y", "terminator-mcp-agent@latest"] to the MCP config. After that, Claude has the 31 tools available in its tool picker. With Notion already open, ask it to do something UI-shaped ("create a page titled April planning and share it with alice@example.com") and the LLM will chain get_applications_and_windows_list, open_application, get_window_tree, click_element, type_into_element, and click_element in one turn. For the full HTTP alternative, the terminator-mcp-agent binary also serves POST /mcp for JSON-RPC and GET /status for load balancer health.

Is this more reliable than pixel-based computer use agents?

Yes, for UI that the OS labels. Terminator's README reports >95% success rate and claims 100x the speed of screenshot-based agents like Claude computer use, ChatGPT Agents, BrowserBase, BrowserUse, and Perplexity Comet. The reason is straightforward: the accessibility tree is deterministic structured data the OS already maintains, so finding a button is a tree walk, not an LLM inference on a 2000x2000 image. For Notion specifically, the app's Electron UI exposes roles and names for nearly every interactive element, so role:Button && name:Share is a one-call resolve. The pixel fallback exists for the rare case where a widget is custom-drawn and thin on accessibility metadata; most of Notion is not that.

Can I combine the API Notion MCP server and Terminator's desktop MCP server in the same agent session?

Yes, and it is often the right move. An LLM host like Claude Code can connect to multiple MCP servers at once; the tool namespace is flat but each tool is prefixed by its server. Use Notion's hosted MCP for bulk operations the API is good at: querying a database with filters, fetching page content, batch-updating properties. Use Terminator's MCP for UI actions the API does not expose or that need to happen in the user's real session: activating a specific view, dragging blocks, using a template picker, sharing through the UI, interacting with comments and reactions, and navigating the sidebar. The LLM decides per step which tool is cheaper. On a typical "create a new weekly review page from a template and share it with the team" task, the desktop MCP is often the shorter path because the template picker is UI-only.

How does Terminator handle elements whose name changes (e.g., "Share" vs "Sharing")?

With combinators. The || operator says "match either name," so role:Button && (name:Share || name:Sharing) covers both. The text: prefix matches visible text (case-sensitive) when the accessible name is empty or translated. When the element is stable but the name is noisy, anchor by nativeid: (Windows AutomationID) instead; Notion's electron builds expose these for many controls. Finally, positional combinators like rightof:, leftof:, above:, below:, near: let you pin to an element whose neighbors are stable even if its own name drifts. Example: click the button to the right of the "Pages" label: role:Button && rightof:role:Text && text:Pages. These patterns come straight from the selector cheatsheet in /docs/SELECTORS_CHEATSHEET.md in the Terminator repo.

What does an end-to-end Notion automation look like in YAML?

Terminator has execute_sequence, an MCP tool whose arguments are a list of steps that nest the other 30 tools. A "open Notion, create a page, share it" workflow is: step 1 open_application with path Notion; step 2 wait_for_element for the Add a page button; step 3 click_element on that button; step 4 type_into_element into the title Edit; step 5 press_key Enter; step 6 click_element on Share. Each step has a selector and arguments, and the whole thing runs as one MCP call, one LLM turn, deterministically. If any step fails, the LLM gets the error on the next turn and can recover. This is what the README means when it says Terminator pre-trains workflows as deterministic code and only calls AI when recovery is needed. The point is you do not pay LLM latency for every click.

Notion MCP server, but for the desktop app

The short answer

What the desktop MCP server ships with

One LLM turn, three Notion actions

The anchor fact: the selector engine that targets Notion UI

The 9 selector prefixes

The 9 combinators

What get_window_tree returns for Notion

One dispatch_tool, one handler per Notion action

The Notion-relevant arms of dispatch_tool

API MCP server vs desktop MCP server for Notion

Anatomy of a desktop-targeted Notion MCP server

Two Notion MCP servers, one protocol

get_window_tree returns structure, not pixels

Selectors instead of coordinates

execute_sequence nests the other 30 tools

Works with the logged-in client

Falls back to pixels only when needed

What happens when the LLM says "create a Notion page"

Make sure Notion is running

Read the current UI

Pick a selector

dispatch_tool matches click_element

type_into_element writes the title

Verify with another get_window_tree

The whole flow as one MCP call

Try it yourself in two minutes

What an LLM can do inside Notion with this MCP server

A sample of Notion UI actions driven through MCP

Same server, any app

Why this matters if you are already using the API MCP

Need a Notion workflow that the REST API cannot do?

Frequently asked questions

Comments (••)

What `get_window_tree` returns for Notion

The Notion-relevant arms of `dispatch_tool`

Comments ()