Writing·Blog

Shipping a PyTorch Model as a Windows Installer: Electron + PyInstaller + AES-256 Model Decryption

Shipping a PyTorch model as a Windows installer: the stack, the pitfalls, and the war stories nobody else writes about.

Saianiruth M

Most production-ML content stops at the API. You train a model, you wrap it in a container, you put it behind a load balancer, you call it done. That's the easy case. The hard case is when the deployment target is a Windows desktop sitting in a hospital, the user is a radiologist who shouldn't have to think about Python, the models are proprietary weights you don't want sitting on disk in plaintext, and the network might be flaky or absent.

I've spent a meaningful slice of my year on this exact problem. The stack that emerged — Electron frontend, PyInstaller-frozen Python backend, AES-encrypted model bundle, Inno Setup installer with a network precheck wizard, an Electron-driven auto-updater that handles admin elevation and mandatory-vs-grace-period upgrade policy — is the kind of combination almost nobody writes about. Most tutorials cover one piece. The interaction between pieces is where everything goes wrong.

This post is the writeup I wish I'd had on day one.

A note before we get into it: this work was done with my colleague Abhijay, with technical guidance from the broader engineering team at 5C Network. I'll use "we" for the work itself and "I" for the specific debugging moments and lessons.


The constraint

Before any code: the constraint that shapes everything.

  • The product is a desktop ML app for non-technical clinical users. They install it once and use it. They don't open a terminal. They don't pip install. They don't think about CUDA versions.
  • The models — multi-gigabyte PyTorch checkpoints — must be encrypted on disk. Plain .pt files in Program Files are a non-starter.
  • Inference runs locally. No remote GPU, no calling out to a cloud endpoint for the core ML. Some auxiliary services (license validation, telemetry sync) hit the internet, but the inference path stays local.
  • The OS is Windows. Specifically Windows 10/11. Linux and macOS aren't supported.
  • Updates have to ship without re-installing. A new model version, a frontend fix, a backend patch — all need to land in the user's machine through an auto-updater they trust.
  • Auto-update needs to handle the case where the running user doesn't have admin rights, because the Electron app launched from a non-admin shortcut even though the install required admin.

None of these constraints are exotic. All of them together are uncommon enough that the SO answers stop being useful pretty quickly.


The stack, in one paragraph

Electron is the shell. It owns the window, the system tray, the auto-updater, and the IPC bridge to the Python backend. The frontend inside Electron is React/TypeScript/Vite — same code that runs in the web deployment. The Python backend is FastAPI serving inference endpoints (/predict, /predict/stream, /health, plus a fleet of feedback and analytics routes). PyTorch loads the models. PyInstaller freezes the whole Python side into a standalone executable so the user doesn't need a Python install. SQLite holds local state — license info, queue status, feedback awaiting cloud sync, model thresholds (more on that in a future post). Inno Setup builds the installer. Models live AES-encrypted in the app bundle and decrypt at first launch into a writable ProgramData workspace.

That's the diagram in prose. The rest of this post is what happens between each of those pieces.


Model encryption: AES at rest, decrypt at first launch

The models cannot live on disk in plaintext. This is non-negotiable for the use case. Two options exist:

  1. Encrypt the models in the installer, ship the encrypted blobs, decrypt on every load.
  2. Encrypt the models in the installer, decrypt once on first launch into a protected workspace, run inference against the decrypted files.

We picked (2). Reason: PyTorch's model loading is slow enough that adding AES decryption to every load is a noticeable latency hit. Doing it once at install gives you a one-time cost and then plaintext-speed loading from then on. The tradeoff is that you now have decrypted weights sitting on disk in a controlled location.

The "controlled location" matters. Three Windows directories to know:

  • Program Files\<App> — read-only after install for non-admin users. Models can't be decrypted here unless the app runs elevated, and we want non-elevated runtime.
  • User home (%LOCALAPPDATA%) — writable, but per-user. Multi-user installs get fragmented.
  • ProgramData\<App> — writable, system-wide, persists across users. The right answer.

Decrypted models, runtime logs, the SQLite DB, and the auto-updater's state file all live under ProgramData\<AppName>\. The Electron main process creates this directory at first launch, runs the AES decryption pass against the bundled encrypted blobs, and then never touches the encryption again unless an updater pushes new encrypted models.

Implementation details that bit me:

  • Initial versions tried to log into the install directory under Program Files. Worked in development, crashed on non-admin runtime with permission errors. Every logging call now uses an absolute path under ProgramData.
  • The AES key cannot live in plaintext in the app. It's embedded via build-time obfuscation, which is not real security but is enough to deter casual extraction. Anyone with a debugger and motivation can extract the key — for the threat model here (preventing accidental copy, not preventing a determined attacker), it's enough.
  • First-launch decryption takes about a minute for multi-gigabyte models with the current streaming implementation. The UI needs an explicit "Setting up..." state with a progress indicator. Otherwise users assume the app froze and force-quit. I learned this the hard way — see below.

The war story. Our first cut of the decryption pass loaded each entire encrypted model into memory in one read, decrypted the whole buffer, then wrote the decrypted bytes out. Functionally correct, monumentally slow — multi-gigabyte models, huge allocations, swap pressure on machines without enough RAM. First-launch decryption was taking four hours. The fix was switching to a streaming pattern — read 4 MB, decrypt 4 MB, write 4 MB, repeat — which brought the same operation down to about a minute. Same algorithm, same key, same models, a ~200× speedup from one architectural shift. The lesson: any time you're working with multi-GB data and your runtime is in hours, the question to ask is "is this work actually atomic, or did I just make it atomic?" Decryption is naturally streamable; we'd just defaulted to load-and-transform because it was easier to write the first time.


The installer: Inno Setup, network precheck, disk-spanning

Inno Setup is the right choice for this kind of installer. It's free, it handles administrator elevation cleanly, it supports custom wizard pages in Pascal-style scripting, and it generates a single .exe users can double-click without thinking about it.

A few specifics from the build that aren't standard:

Disk-spanning for >4GB installers. PyTorch + transformers + your model weights add up fast. Once the bundled installer exceeds 4GB, you need DiskSpanning=yes in the Inno Setup script. Without it, the single-file installer hits the 4GB FAT32 limit and silently corrupts during download on certain user setups.

A network precheck wizard page. Before the user clicks "Install," the wizard verifies connectivity to the hosts the app will need at runtime: the license validator endpoint, the update server, and a couple of telemetry sinks. Failure here surfaces a clear "your network blocks X" message before the user wastes ten minutes installing an app that won't activate. The host list is currently hardcoded into the installer, which means adding a new host requires a new installer build — one of the open items on our list to fix.

A pre-flight frozen-build test. Before the installer ships, a test_frozen.py step runs against the PyInstaller-frozen backend to confirm it actually starts. This caught a real regression once: a setuptools version (82.0.1) had introduced a behavior change that broke our frozen backend imports. Pinning back to 79.0.1 fixed it. Without the pre-flight test, that would have shipped to users.

Unicode hell on Windows. Windows defaults to cp1252 encoding in console streams, which means any logger output containing a non-ASCII character (medical reports include patient names with diacritics, units like µm, you name it) crashes the frozen Python process with a UnicodeEncodeError. The fix is to reconfigure stdout, stderr, and every logger module to use UTF-8 at process startup. Tedious. Easy to miss in a module added six weeks after launch. Worth a code-review checklist item.


The auto-updater: the hardest part

The auto-updater is where most of the time went. Electron has autoUpdater built in for code signing-based updates, but the requirements here weren't standard:

  • The user might not be running the app as administrator. The installer was admin, but the daily launch isn't.
  • Updates can be mandatory (a security or model fix) or optional (a UI improvement). Mandatory updates need to force the user's hand without nagging them every 10 minutes.
  • A failed download must not leave the app in a broken state.
  • The update must survive an app restart mid-download.

The architecture we landed on:

Scheduled checks against object storage. The app polls the update server every few hours under normal conditions; once an update is detected, the polling cadence increases so the system can act on the result promptly. The server is just a Google Cloud Storage bucket with version manifests; nothing fancy. Authentication uses HMAC interop keys with AWS CLI tooling for upload/list/delete from the ops side. (Credential rotation is one of the open items — currently the keys are checked into .env files, which works but isn't ideal. The rotation procedure is something the ops team is still defining.)

Three-attempt download, validate, then announce. When an update is found, the updater downloads the new installer to a temp location. After download, it validates the executable by checking PE magic bytes (the first two bytes of a valid Windows EXE are MZ). If validation fails, it retries up to three times before surfacing anything to the user. Only after a validated download does the UI show the "update available" banner. The reason: a half-downloaded .exe showing in the user's notifications, then failing to install, is a worse experience than a slightly delayed announcement.

Persistent pending-update state. The updater writes a pending-update.json to ProgramData containing the download path, version, mandatory flag, and timestamp. If the app is killed during download or restarted before the user clicks "Install," the next launch reads this file and picks up where it left off. Without this, killing the app mid-download would leave an orphan .exe taking up disk space with no state to clean it up from.

Mandatory vs grace-period UX. When the new version is marked mandatory, the UI shows a native modal dialog blocking interaction. When it's a grace-period update, a bottom-pinned banner appears with a 45-minute reminder cadence. The reminder doesn't reset on app restart — it's anchored to the time of first detection — so users can't dismiss-and-restart to avoid the prompt.

Admin elevation handling. Installing the new version requires admin. If the user is non-admin, the updater spawns the new installer with elevation request (runas), and Windows handles the UAC prompt. The current implementation caches the admin check at app start, which means if a user elevates mid-session it doesn't know — minor risk, on the open-items list to fix.

The auto-updater story worth telling is a chain of three bugs that compounded.

We migrated the update artifact from one cloud storage bucket to another. I updated the manifest JSON to point at the new path. What I didn't catch: a service in the backend was still polling the old bucket path, baked in from before the JSON was the source of truth. So new updates uploaded to the new bucket were never being announced. Push-to-update was silently broken for some time before anyone noticed.

Once that was fixed, the next bug surfaced. The updater polls for available updates every 5 minutes. A real download takes 7-8 minutes. So a download would start, the 5-minute timer would fire mid-download, and the updater would start a second download into the same temp location — corrupting the first. When the first download completed, validation would fail because the bytes weren't actually the bytes the server had served. The install would fail downstream.

The fix was small: a downloading=true flag persisted alongside the pending-update state. While set, the periodic check skips initiating a new download even if it sees a fresh manifest. Cleared on download completion, on failure, on app restart. Three lines of logic that closed a class of race I should have anticipated and didn't.

The meta-lesson: anywhere you have a periodic checker and a long-running operation, the period needs to be longer than the operation — or you need an explicit "in progress" gate. I had neither at the start.


Logging that survives Windows

A surprising amount of time on this kind of project goes into logging. Not the what to log part — the where to log part.

Three rules we've converged on:

  1. Every log file path is absolute, rooted in ProgramData. Relative paths break the moment a different working directory is set, which happens silently when Electron spawns the Python backend.
  2. Every Python module reconfigures its stdout/stderr to UTF-8 at module import time. Boilerplate, but the failure mode (silent crash on first non-ASCII log line, three weeks after launch) is bad enough to be worth it.
  3. Every log line has a request ID. When a clinician reports "the app gave me wrong results on this case," tracing the path from frontend click to backend inference to model output requires the request ID to thread through every log entry on the way. Structured logging with structlog makes this trivial; ad-hoc print() calls make it impossible.

There's still a diagnostic [Diag] trace in the Electron main process that I added during a debugging session for a Windows 11 silent-close issue. It hasn't been removed because the silent-close had multiple potential root causes (Chromium GPU process behavior, accessibility dialog interactions, font rendering) and I want a few more weeks of clean runs before I'm sure the fix held. This is an honest admission: production code has diagnostic instrumentation that outlives its purpose, and removing it too early is how you lose the data you need when it recurs.


Pitfalls — the concrete list

Every Windows ML packaging project hits these. Saving you the rediscovery cost:

  1. Windows Defender false positives during build. PyInstaller-frozen executables look enough like generic malware behavior (compressed Python interpreter unpacking at runtime) that Defender will quarantine them mid-build, breaking the build with confusing errors. The workaround we use now is disabling real-time protection during build (requires admin) — not great, the better fix is a configured folder exclusion, which we haven't moved to yet.

  2. PyInstaller missing submodules. Dynamic imports inside libraries (especially transformers, torch.distributions, huggingface_hub) need --collect-submodules flags or they'll silently miss code at freeze time and crash at first call. Build a test_frozen.py smoke test that exercises every major code path before shipping.

  3. setuptools version regressions. A minor version bump can break PyInstaller's pkg_resources integration. Pin setuptools to a known-working version (79.0.1 in our case as of writing). Don't take floating setuptools versions in your requirements.txt.

  4. Code signing. Without a code-signing certificate, Windows SmartScreen warns users about your installer being "from an unknown publisher." Some won't proceed past the warning. A certificate from a reputable CA solves it. Cheaper EV options exist; figure out the legal-entity requirements early since they take days to clear.

  5. HuggingFace cache layouts in frozen builds. If your code relies on huggingface_hub cache layouts (e.g., loading via from_config), a single file change in the HF cache structure breaks the frozen path. Treat HF caches as build-time artifacts, not runtime downloads.

  6. License Authorization headers. Whatever scheme you use for the license check needs more than a static embedded token. A shared token is acceptable as a starting point but a per-device challenge-response is what you should be moving toward. (We're currently on the shared-token approach. Migration is on the roadmap.)

  7. Date-tampering attacks. If your license has an expiry, users can roll their system clock back to bypass it. The fix is to record last_seen_time in your local DB on every app start and refuse to launch if the current time is earlier than the last seen time. Cheap, effective, catches the obvious cases.


What I'd build differently

If I were starting from scratch today:

I'd use Tauri instead of Electron. Electron's Chromium overhead is real and the bundle size penalty is meaningful when you're already shipping multi-gigabyte models. Tauri's Rust-based shell is a fraction of the size and has matured significantly in the last year. The migration cost from Electron isn't trivial, but for a greenfield project the choice is clearer than it was even 12 months ago.

I'd build the network precheck host list into the DB, not the installer. The current hardcoded host list means a new endpoint requires a new installer build. Reading it from a manifest checked at first launch (and updated alongside model thresholds) would close that loop.

I'd put more effort into AV / Windows Defender mitigations upfront. Signed builds, proper folder exclusions, submission to AV vendors for whitelisting. These are operationally annoying to set up but they're a one-time investment that prevents an entire class of "the installer is being flagged at customer site X" emergencies.

I'd write the auto-updater state machine in a typed language first. The current implementation is in Electron's main process (Node.js/JavaScript), and the implicit state machine — checking, downloading, validating, prompting, installing, restarting, recovering from partial states — is the kind of thing where types catch real bugs. TypeScript on the Electron main process would have caught at least three real bugs I shipped and patched.


Closing

The post that doesn't exist on the internet is the one about how the pieces fit together. Each individual piece has documentation. Electron has docs. PyInstaller has docs. Inno Setup has docs. AES encryption has docs. What you don't get from any of those is what happens when you stitch them together for a real product where the user is a non-technical clinician, the network might fail, the disk might be full, and the AV might quarantine your installer at 3am.

If you're shipping ML to a Windows desktop, the model is maybe 20% of the work. The other 80% is the boundary between your code and the operating system, the user, and the network. Most of that 80% isn't taught anywhere.

I'll be writing more about specific pieces — the SQLite-backed inference queue, the hot-reloadable threshold system, the SSE streaming for incremental inference results — in separate posts. Each is a thing we had to build, and each has its own gotchas worth a full writeup.

If you're building something similar and hit a wall, reach out — we've probably hit the same wall.

Thanks to Abhijay for partnering on this end-to-end, and to the wider engineering team at 5C Network for the architectural guidance at the hard points.


This is part of an ongoing series on production medical AI engineering. The companion year-one reflection post is here.