What MCP server means once the protocol explainer ends and the process actually runs
Search this phrase and the top ten results all land in the same place: a server is a process that exposes tools, resources, and prompts over JSON-RPC. Correct, and useless the moment you write one. This page is about the other half. What a real MCP server does on every tool call, before and after the dispatch function runs, on a machine where tools actually touch the OS. The reference implementation is Terminator's MCP agent, a 10,912 line Rust server whose job is to let Claude click real buttons in real apps without stealing your keyboard.
The textbook answer, then the one that matters
The textbook: an MCP server is a program that speaks the Model Context Protocol, exposes a set of named tools plus optional resources and prompts, and responds to list_tools and tools/call. That definition is correct and incomplete. It describes an interface, not a process.
The one that matters: an MCP server is a long-running process that hosts that interface and absorbs the operational cost of every tool call. For a desktop automation server, that cost is concrete. It has to pick up your focus state, dispatch the tool without letting another call race it, cancel mid-flight if the client disconnects, write structured logs so you can replay what happened, restore your caret at the exact byte offset it was before, and tell a load balancer it is busy so the next call does not fight the current one. None of that is in the spec; all of it is in the server.
If you want to see the operational layer in one file, open crates/terminator-mcp-agent/src/server.rs and search for async fn call_tool. It starts at line 10541. The actual match block that dispatches to a handler is buried 300 lines later inside the function. The 300 lines between are what this page is about.
The server, measured
Numbers counted in the current open-source repo. The first is the number of operational layers wrapped around each dispatch. The second is the line count of server.rs. The third is the default per-machine concurrency limit in HTTP mode. The fourth is the settle delay after window management and before dispatch.
What wraps a single tool call
On the left: clients that reach the server. In the middle: call_tool, which is not the dispatcher. It is the wrapper. On the right: the concrete side effects every call produces, regardless of which tool ran.
call_tool is the wrapper, not the dispatcher
The anchor fact: one line decides whether the server touches your keyboard
Scroll to line 10621 of crates/terminator-mcp-agent/src/server.rs. This is the single place the server decides whether it will restore your caret after a tool finishes. Click-like tools are excluded on purpose: if the LLM clicked into a text field, you want focus to land there, not snap back to wherever you were before. Every other tool (read a window tree, run a workflow, grep a file, launch an app) saves focus first, runs, and restores.
The implementation lives in crates/terminator/src/platforms/windows/input.rs:171. It creates a IUIAutomation instance, calls GetFocusedElement, and if the element supports TextPattern2 it also saves the caret range. On restore, it calls SetFocus() on the saved element and, if a caret range was captured, writes it back. This is what "MCP server" means when the tools are UI actions: the server is explicitly responsible for not stealing your keyboard.
Verify in 10 seconds: git clone https://github.com/mediar-ai/terminator, then grep -n restore_focus_default crates/terminator-mcp-agent/src/server.rs. Two hits: one in call_tool, one in dispatch_tool. Both apply the same matches! rule. This is not a hypothetical. It is the reason running Terminator in a background Claude session does not yank your cursor around.
The seven layers wrapped around every dispatch
Here is what happens between the moment call_tool receives a CallToolRequestParam and the moment the handler actually runs. And what happens after. If you only remember one section of this page, make it this one.
Request arrives
Claude Code emits a JSON-RPC tools/call frame with tool_name and arguments. call_tool deserializes it and the operational preamble begins. Nothing has touched your OS yet.
Per-client mode check
The server reads client_info.name from the peer (claude-code, mediar-app, cursor). If the client is in Ask mode and the tool is blocked, call_tool returns error -32002 immediately. No dispatch, no side effects.
Focus state saved
On Windows, unless the tool is click-like, save_focus_state() captures the currently focused UIA element and its caret range. This is the checkpoint the server rewinds to after the tool runs.
Dispatch and cancellation race
dispatch_tool match arm deserializes into a typed Args struct and calls the handler inside tokio::select! against request_context.ct. If the client cancels, the tool stops; otherwise the handler touches UIA or AX and returns.
Log and screenshot capture
execution_logger::log_response_with_logs writes the request, response, duration, and captured logs to executions/. If the tool mutated the UI, screenshots before and after are attached to the record.
Focus restored, activity timestamp updated
restore_focus_state() puts your caret back where it was (unless you were clicking, in which case focus follows the click). In HTTP mode, active_requests is decremented and last_activity is written. The next /status probe will return 200 again.
Cancellation lives inside every match arm
Each arm is a tiny tokio::select race. If the client hangs up or fires stop_execution, the ct token trips and the tool loses the race. The handler stops instead of finishing in the background and leaving your UI in an in-between state.
What a real MCP server is responsible for
A protocol interface handles requests and returns responses. A server, the process hosting that interface, is responsible for keeping the machine in a coherent state while it runs. Six of those responsibilities, as implemented in Terminator:
Focus save and restore
Before every non-click tool, the server calls save_focus_state() (platforms/windows/input.rs:171). After the handler returns, restore_focus_state() puts the caret back. The MCP server never steals your keyboard.
Cancellation mid-call
Every match arm wraps the handler in tokio::select! against request_context.ct.cancelled(). If the client cancels the request, the running tool stops instead of finishing in the background.
Per-client mode
claude_code in Ask mode has a blocked_tools set. call_tool rejects -32002 when asked, list_tools filters them out. mediar-app (the UI) never gets filtered. Mode is stored per-client.
Busy 503 for load balancers
HTTP mode binds GET /status. Returns 200 when idle, 503 when active_requests >= MCP_MAX_CONCURRENT. An Azure Load Balancer probe drains a VM the moment one tool starts.
Execution log capture
Every call runs inside a log capture window. Structured and tracing logs from the handler and its descendants are written to executions/ with screenshots before and after the UI action.
Dead peer pruning
on_initialized sends notify_logging_message('ping') to every broadcast peer. Peers that fail the ping are dropped. This keeps the broadcast list from leaking across crashed clients.
Under load: 503 is the correct answer
In HTTP mode, the server exposes three endpoints: GET /health for liveness, GET /status for busy-aware routing, and POST /mcp for the JSON-RPC body. /status is the operational primitive: it returns 200 when idle and 503 when a tool is in flight, so Azure Load Balancer (or any L7 probe) can take the VM out of rotation instantly.
The switch is governed by a single environment variable. Raise it if your tool set is actually parallel-safe (file I/O, read-only queries) and leave it at 1 if any tool touches a GUI. This is documented in the README at lines 36-45 and the implementation is below.
What the spec defines vs what the server actually does
The protocol body tells you how a request is framed. The server body tells you what happens while that request is in flight. Both are part of what MCP server means.
| Feature | MCP spec body | Terminator MCP server |
|---|---|---|
| What the word describes | A protocol interface (tools, resources, prompts) | A long-running OS process that hosts the interface plus all the side effects |
| On every tool call | Spec is silent; 'server handles it' | Save focus, set in_sequence flag, start log capture, start tracing span |
| Cancellation | Not in the spec body | tokio::select on request_context.ct for every single match arm |
| Per-client policy | Tools are either listed or not | claude-code client in Ask mode gets blocked_tools filtered on list_tools and call_tool |
| Under load | Queue, retry, or return 429 | MCP_MAX_CONCURRENT=1 by default, GET /status returns 503 so LBs drain the VM |
| Observability | Up to the implementer | execution_logger::log_request + log_response_with_logs wrap every dispatch, PostHog timings per tool |
| Session lifecycle | initialize, then tools/call | on_initialized prunes dead peers via notify_logging_message('ping') before adding new ones |
“Every file, function, and line number on this page is grep-able in a fresh clone of mediar-ai/terminator. Nothing is composited.”
github.com/mediar-ai/terminator
Why this distinction is worth the ink
If you are building an MCP server for a remote API (Slack, Linear, GitHub, Postgres), most of this page does not apply to you. Your handlers are network calls; your server is a thin wrapper over an SDK; the spec alone describes 90% of the work.
If you are building an MCP server that runs on the same machine your user is working on, and whose tools have real side effects on that machine, the operational layer is most of the work. Terminator is the extreme case: its MCP server is the product, its handlers drive the Windows UI Automation and macOS Accessibility APIs, and every single tool call has to be polite about the user's session. That politeness is implemented line by line in call_tool, not inherited from the protocol.
When you read "mcp server means" in search results and see only the protocol definition, you have been shown half of it. This page is the other half.
Lines of wrapper around each dispatch
Counted from the top of call_tool (server.rs line 10541) down to the match-arm dispatch. The dispatch itself is ~3 lines. The rest is operational policy.
Install the Terminator MCP server
Runs over stdio under Claude Code, Cursor, VS Code, or over HTTP behind a load balancer. MIT-licensed. The 300 lines of wrapper around each dispatch are yours to read.
claude mcp add terminator →Frequently asked questions
What does 'MCP server' actually mean, in one sentence?
An MCP server is a long-running process that speaks the Model Context Protocol over stdio or HTTP and hosts a set of named tools an LLM can call. The word 'server' is doing a lot of work here: it is not just a protocol endpoint. It is a process with state, a log pipe, cancellation tokens, focus-saving side effects (if the tools touch a UI), and, in Terminator's case, per-client Ask/Act policy. Every top result for this query stops at the protocol interface; the operational layer is equally part of what the word means in practice.
Why does the definition 'a protocol interface for tools, resources, and prompts' feel incomplete?
Because an MCP server in production is a process with side effects. A GitHub MCP server that makes HTTP requests is fine to describe purely as a protocol interface. A desktop MCP server like Terminator cannot be. The moment a tool call types a keystroke or clicks a window, the server becomes responsible for save/restore of the caret, for focus policy per tool (click_element, invoke_element, and hover_element deliberately skip focus restore because the user wanted focus to land on the clicked element), for cancellation so the LLM can stop a long-running action, and for concurrency so your machine does not run two conflicting automations at once. Those responsibilities are what 'server' means in full.
What is the concrete thing Terminator's MCP server does that a REST API does not?
It saves and restores your keyboard focus around every tool call. The exact line is in crates/terminator-mcp-agent/src/server.rs around line 10621: let restore_focus_default = !matches!(tool_name.as_str(), "click_element" | "invoke_element" | "hover_element"). For every tool that is not click-like, the server calls save_focus_state() via the Windows UI Automation API before the tool runs and restore_focus_state() after it returns. A REST API has no concept of this because a REST endpoint is not embedded inside your desktop session. It is a side-effecting process living next to your editor.
What are the operational layers wrapped around an MCP tool call?
Seven, in order: (1) client mode check, which rejects blocked tools in Ask mode; (2) cancellation reset, so a prior stop_execution does not cancel the next call; (3) workflow context extraction for execute_sequence steps; (4) execution logging start, with tracing span and log capture; (5) focus state save, unless the tool is click-like; (6) the dispatch itself via tokio::select on the cancellation token; (7) response logging, screenshot attachment, PostHog timing, and focus restore. All seven live inside async fn call_tool in server.rs. You can see them sequentially if you read from line 10541 down.
Why does MCP_MAX_CONCURRENT default to 1?
Because a desktop MCP server serializes access to the GUI. Two tool calls running in parallel would fight each other for window focus, keyboard state, and clipboard. Terminator's main.rs reads MCP_MAX_CONCURRENT from the environment and defaults to 1. If a second tools/call arrives while one is in flight, POST /mcp returns 503 and GET /status returns 503 so an Azure Load Balancer can route traffic to a different VM. You can raise the limit if your tools are genuinely independent (read_file, grep_files, glob_files) but for UI-touching tools, serial is correct.
How do I inspect an MCP server's actual behavior, not just its tool list?
Run it in HTTP mode and probe the status endpoint. terminator-mcp-agent -t http starts it on localhost; curl localhost:8765/health returns 200 while the process is alive, curl localhost:8765/status returns the busy-aware JSON above, and a trace of log lines written to executions/ for each call shows exactly what the server did, with before/after screenshots. The tool list is the easy part; the operational behavior is the part the README lines 36-45 actually describe.
Does every MCP server do this much work on a tool call?
No. A minimal MCP server written against the official Python or TypeScript SDK can be 50 lines that handle tool dispatch and nothing else. The operational layer becomes necessary when the server has real-world side effects (UI state, clipboard, focus, filesystem) and multiple clients with different trust levels. Terminator has both, which is why call_tool in server.rs is ~300 lines of bookkeeping around each dispatch. If your MCP server wraps a stateless remote API, you probably do not need most of this. If it touches a running OS session, you need all of it.
What is the practical difference between 'MCP server' and 'MCP endpoint'?
Colloquially nothing; practically everything. 'MCP endpoint' implies a URL that accepts JSON-RPC. 'MCP server' implies the host process that owns that endpoint and everything around it: the tool registry, the cancellation tokens, the log capture, the focus policy, the per-client mode state, the broadcast peer list. Reading modelcontextprotocol.io you learn what an MCP endpoint accepts. Reading crates/terminator-mcp-agent/src/server.rs you learn what an MCP server is.