Windows program automation starts with a three-backend launcher

Every Windows program automation tutorial skips the first hard problem: how do you actually start the program? Terminator's open_application routes a single call through ShellExecuteW for settings URIs, IApplicationActivationManager for UWP and Start-menu apps, and CreateProcessW for legacy .exe. The reason matters, and the code is 470 lines long in a single file.

M
Matthew Diakonov
9 min read
4.9from dozens of design partners
open_application at applications.rs line 459 branches into three Win32 launchers
Get-StartApps cached for 30 seconds, saves ~566ms per repeat launch
EnumWindows binds PID to HWND 10-100x faster than a UIA tree walk

The missing first step in every guide

Read any Windows program automation tutorial on the internet and the first instruction is some variation of "make sure the program is open." PyAutoGUI shows a screenshot of Notepad already running. AutoHotkey scripts assume the target window exists and hop straight to WinActivate. Power Automate Desktop offers a Run application action that is thin syntactic sugar over ShellExecute and flat out fails on packaged Microsoft Store apps.

Launching a Windows program is not one operation. It is three. Modern Store apps need IApplicationActivationManager, an obscure COM interface that returns a PID for a packaged app. Settings pages are URIs like ms-settings:display and have to go through ShellExecuteW with the right verb. Everything else is still CreateProcessW. An automation framework that pretends otherwise will silently break on the first machine where Calculator is the UWP one.

Terminator's open_application knows this. The whole routing tree, including the retries and the cache, lives in one file: crates/terminator/src/platforms/windows/applications.rs. This page walks through it.

The three-backend router, in Rust

Paste a name. Terminator classifies the string and picks the right Win32 API. If the first choice misses, it falls through to the next. Three branches, top to bottom, with explicit fall-off.

crates/terminator/src/platforms/windows/applications.rs

What a name turns into, visually

Left column: the kind of string you pass. Middle: the single function. Right column: which Windows API ran. One call aggregates them all.

open_application('whatever you pass in')

ms-settings:network
Calculator
notepad.exe
chrome
open_application()
ShellExecuteW
IApplicationActivationManager
CreateProcessW
shell:AppsFolder fallback
100x

Fast window finder using Win32 EnumWindows API. This is 10-100x faster than UI Automation tree traversal.

Inline comment at applications.rs line 60

The anchor fact: a static Mutex that saves 566ms

Resolving a Start-menu name like "Calculator" into the AppUserModelID that IApplicationActivationManager needs goes through PowerShell. Specifically powershell -NoProfile -Command "Get-StartApps | ConvertTo-Json". Spawning PowerShell and parsing its JSON takes about 566ms on the reference hardware the repo was tuned on. The comment at line 541 says so explicitly.

So the file ships a static Mutex over an Option<StartAppsCache> with a 30-second TTL. First launch pays the 566ms. Subsequent launches inside the window read from the cache. In a recorded workflow that launches three programs back to back, only the first one pays the shell-out.

applications.rs (cache path)
0Win32 launch APIs under one function
0msGet-StartApps cost skipped per cache hit
0sStart-menu cache TTL
0mspost-launch sleep (down from 1000ms)

The UWP branch has its own fallback

IApplicationActivationManager.ActivateApplication is the correct call for a packaged app, but it can fail. Group Policy can block activation, AppContainer rules can reject the caller, and some enterprise Windows 11 builds simply refuse. Terminator's launch_app catches that failure and retries via ShellExecuteExW with the special shell:AppsFolder\\{AppID} path, which asks the Windows shell to treat the app like a file and open it. Two tries inside one branch.

applications.rs (UWP with shell fallback)

Then: bind the new PID to its window, the fast way

After any of the three launchers returns a PID, Terminator needs to hand you back the UIElement for the window, not the process. The naive way is a UI Automation tree walk from the root: ask Windows for every top-level element, read each one's ProcessId, stop when you match. On a crowded desktop that is thousands of cross-process COM calls.

The file uses Win32 EnumWindows instead. Same address space, no COM, and the callback stops the moment it finds a visible HWND with a matching PID. The comment right above the function reports the win: 10-100x faster than UIA traversal.

applications.rs (PID to HWND)

What Windows programs does this cover

Every kind of installed program on a modern Windows machine lands in one of these buckets. The router maps each to a specific Win32 call.

ms-settings: URIs

Settings pages like ms-settings:network, ms-settings:display, ms-settings:bluetooth. Shell-executed directly, then the Settings window is grabbed by name. One branch, one Win32 call.

Microsoft Store apps (UWP)

Calculator, Snipping Tool, Photos, Terminal, the new Notepad. Resolved through Get-StartApps to their AUMID, launched through IApplicationActivationManager, with shell:AppsFolder as a fallback.

Classic .exe

notepad.exe, cmd.exe, any legacy 32-bit or 64-bit Windows binary. CreateProcessW with CREATE_NEW_CONSOLE. Full command-line supported through the app_name string itself.

Start-menu-listed Win32 apps

Anything Get-StartApps enumerates, including installers like Office, Slack, Notion, and VS Code. Name match against the cached AppID, then ApplicationActivationManager takes over.

Browsers, treated specially downstream

chrome, firefox, msedge, edge, brave, opera, vivaldi, arc. Launched by the same function, but the KNOWN_BROWSER_PROCESS_NAMES constant flags them so tools can attach the Chrome extension bridge.

Apps with a shortcut but no AppID

Legacy installers that predate AppUserModelID. Fallthrough to launch_legacy_app uses CreateProcessW with the raw path. Windows resolves the path against PATH and known shortcut folders.

Same Python, three different Win32 code paths

Your script does not switch APIs. Terminator does. Each of these three lines executes a different Win32 launcher under the hood, decided by the shape of the string you pass.

launch_a_program.py

What this replaces

The way most automation tools launch a program against the way Terminator does. Flip the toggle.

Your script decides on launch policy. You figure out when a Store app needs a shell:AppsFolder prefix, when ms-settings URIs need explorer.exe, and when a CreateProcess call is enough. You poll GetForegroundWindow in a sleep loop after launching and hope the right window is on top.

  • UWP apps silently fail, no AppID resolution
  • Every ms-settings page needs a separate case
  • You poll for the new window after launching
  • No cache: PowerShell shell-out on every lookup

The full launch sequence, one call at a time

Five video-style frames. What happens between open_application("Calculator") returning and your next click landing.

What the launcher actually does

01 / 05

1. Classify the name

Starts with ms-settings:? Branch 1. Matches a cached or fresh Get-StartApps entry? Branch 2. Neither? Branch 3.

Five steps to your first program launch

1

Install the SDK or the MCP agent

pip install terminator for Python. npm i @mediar-ai/terminator for TypeScript. Or wire the MCP agent into Claude Code with claude mcp add terminator 'npx -y terminator-mcp-agent@latest' and drive everything from chat.

2

Call open_application once

desktop.open_application('Calculator') from Python or TypeScript. From the MCP side, the open_application tool takes the same string. No command-line gymnastics, no subprocess arguments.

3

Let the router pick the Win32 backend

Your string is classified. ms-settings URIs take the ShellExecute path. Start-menu entries hit the cache and then ApplicationActivationManager. Anything else goes through CreateProcessW.

4

Receive a UIElement bound to the new window

EnumWindows finds the HWND. Terminator wraps it. The returned UIElement is scoped to that window and ready for selector queries like role:Button && name:Equals.

5

Drive the app with selectors

calc.locator('role:Button && name:Seven').click(). Every subsequent call goes through the Windows UI Automation COM API against the window you just launched. No pixels, no OCR.

Feature by feature

FeatureTypical Windows program automationTerminator
Single API for UWP, Win32, and ms-settingsOne API per launch type, or noneopen_application(name) handles all three
Activates packaged apps by AppIDRuns the .exe, miss AppContainer setupIApplicationActivationManager.ActivateApplication
Fallback when UWP activation is blockedHard errorRetries via shell:AppsFolder\{AppID}
Caches the Start menu listingShells out on every launch30-second Mutex<StartAppsCache>, saves ~566ms
How it finds the launched windowUIA tree walk or GetForegroundWindow pollEnumWindows + PID match (10-100x faster)
Language bindings around the same launcherRebuild for each languageRust core, Node, Python, MCP all call one fn
Open sourceProprietary or bundledMIT on github.com/mediar-ai/terminator

Verify every anchor fact against source

Every number on this page comes from a line in one file. Clone the repo and grep for yourself.

zsh
0

Win32 launch APIs reachable through a single open_application() call in applications.rs.

0s

Get-StartApps cache TTL. Defined as CACHE_TTL = Duration::from_secs(30) at line 546.

0x

Upper bound speedup of EnumWindows over a UIA tree walk for PID-to-HWND binding. Source comment at line 60.

Need Windows program automation that actually launches UWP apps?

Book 20 minutes and we will run open_application against your real workflow, from Calculator to an internal enterprise .msi.

Frequently asked questions

What is Windows program automation and why is launching the program the hard part?

Windows program automation is the practice of driving desktop applications (not browsers) through code instead of hands. Every tutorial jumps straight to clicks and keystrokes, but the first call in any real workflow is launching the target program, and Windows has three separate APIs for that. ShellExecuteW opens URIs and documents, ApplicationActivationManager.ActivateApplication is the only reliable way to start modern packaged apps without splash screens or AppContainer issues, and CreateProcessW is still correct for legacy Win32 .exe binaries. A Python script calling os.startfile or subprocess.Popen gets the legacy case right and everything else wrong. Terminator's open_application picks the right backend by inspecting the name you pass.

Which file contains the three-way launch router?

crates/terminator/src/platforms/windows/applications.rs. The entry point open_application is at line 459. If the name begins with ms-settings: it calls ShellExecuteW at line 471 and then opens the Settings window by name. Otherwise it tries get_app_info_from_startapps, and on a hit it calls launch_app, which wraps ApplicationActivationManager at lines 764 through 780. On a miss it falls through to launch_legacy_app at line 855, which calls CreateProcessW. Three branches, three Win32 APIs, one Rust function.

Why cache Get-StartApps for 30 seconds?

Get-StartApps is the PowerShell cmdlet that enumerates everything in the Start menu, including all UWP apps and their AppUserModelIDs. It is how Terminator resolves a name like notepad or Calculator into the AppID ApplicationActivationManager needs. Spawning PowerShell to run it is the expensive step. The source comment at line 541 says this saves ~566ms on every cached launch. CACHE_TTL is 30 seconds, which is long enough to cover a burst of launches in one workflow and short enough that new Store installs become visible quickly. The cache lives in a static Mutex<Option<StartAppsCache>> so every thread in the MCP agent shares it.

Why not use a UIA tree walk to find the window after launching?

Because it is slow. A UIA walk over a cold desktop can touch thousands of HWNDs and issues cross-process COM calls on every one. The Win32 EnumWindows API runs in the same address space and just iterates HWNDs, checking GetWindowThreadProcessId for each against the target PID. Terminator's find_hwnd_by_pid_fast uses it (applications.rs lines 58 through 101). The inline comment at line 61 calls out the win: 10x to 100x faster than UI Automation tree traversal. That single change let the post-launch sleep at line 832 drop from 1000ms to 300ms while still catching slow-starting apps.

What happens when ApplicationActivationManager.ActivateApplication fails on a UWP app?

Terminator catches the error and retries via ShellExecuteExW with the special shell:AppsFolder\{AppID} path (lines 780 through 816). shell:AppsFolder is the Windows shell namespace that exposes every installed app as if it were a file. This handles the case where a Store app is installed but activation policy rejects the direct COM call, a common failure on enterprise Windows 11 builds. If that also fails to return a valid process handle, the code falls through to looking up the UI element by window name. Three fallbacks, so a single open_application call survives most real-world Windows configurations.

Does this work on Windows 10 and Windows 11 both?

Yes. ApplicationActivationManager, Get-StartApps, ShellExecuteW, EnumWindows, and CreateProcessW are all present on Windows 10 1809 and later and every shipping version of Windows 11. The Windows UI Automation COM API the rest of Terminator uses has shipped since Windows 7. Terminator does not run on macOS or Linux for Windows program automation, since these APIs are Windows-only.

How does this compare to PyAutoGUI, AutoHotkey, and Power Automate Desktop for launching programs?

PyAutoGUI has no launch primitive at all. Its docs assume you use subprocess.Popen or os.startfile, which only cover legacy .exe cleanly. AutoHotkey v2 has Run with a FailIfNotFound parameter, which shells out via ShellExecute under the hood and works for .exe and document URIs but does not activate UWP apps by AppID. Power Automate Desktop has a Run application action, which is ShellExecute with extra wrapping, and a separate unofficial workflow for UWP that uses explorer.exe shell:AppsFolder. Terminator is the only framework in this list that tries ApplicationActivationManager first, which is the only API Microsoft actually recommends for packaged apps.

Can I call open_application from a single script, or do I have to use MCP?

Both work. From Python: import terminator; terminator.Desktop().open_application('calc.exe'). From TypeScript: import { Desktop } from '@mediar-ai/terminator'; new Desktop().openApplication('calc.exe'). From Claude Code or Cursor: the open_application MCP tool (registered in crates/terminator-mcp-agent/src/server.rs) does the same. Every language binding resolves to the same Rust function in applications.rs, so the three-backend behaviour is identical.

Can the agent launch a browser and an app in the same workflow?

Yes. open_application treats chrome, firefox, msedge, edge, brave, opera, vivaldi, and arc as known browser process names (the KNOWN_BROWSER_PROCESS_NAMES constant at applications.rs line 52). It still routes them through the same three backends, but downstream tools use that list to decide whether to attach the Chrome extension bridge for DOM-level automation versus treating the window as an opaque Win32 program. So a single workflow can launch Excel, Notepad, and Chrome and drive all three through the same selector language.

terminatorDesktop automation SDK
© 2026 terminator. All rights reserved.