QCon London 2026: Fixing the AI Infra Scale Problem by Stuffing 1M Sandboxes in a Single Server

At QCon London 2026, Felipe Huici, CEO and co-founder of Unikraft, stood before an audience and woke up virtual machine number 1,000,000 on a commodity server. It responded with "Kubernetes rocks" in about ten milliseconds. The demo capped a talk that traced a decade-long journey from academic research into unikernels to a cloud platform that can cold-boot and respond from a VM in single-digit milliseconds, and cram over a million of those VMs scaled to zero onto off-the-shelf hardware.

Huici started with a primer on isolation primitives, cutting through what he called "isolation soup." Containers, VMs, microVMs, unikernels, and language-level isolates. The industry has plenty of options, and most people have stopped caring about the distinctions. But the distinctions matter. His argument centred on the trusted computing base: the shared software everything sits on top of. For VMs, that's the hypervisor, small and purpose-built. For containers, it's the entire Linux kernel, tens of millions of lines of code. For language runtimes, larger still. A microVM, he pointed out, is just a VM launched by Firecracker; the name is marketing, not architecture. A unikernel is still a VM too, but with a custom OS that includes only what the application needs. Sticking a container inside a VM gives you VM-grade isolation, regardless of whether you can make that efficient.

That question has been the thread running through Huici's career. As a PhD student at UCL, he worked on virtualized packet processing, building custom VMs on top of the Xen hypervisor's Mini-OS. That work led to ClickOS, published at NSDI 2014, which demonstrated VMs booting in thirty milliseconds and processing packets at ten gigabits per second. When Docker arrived, and everyone declared containers the silver bullet, his team published "My VM is Lighter (and Safer) than Your Container" at SOSP 2017, showing that with careful engineering, you could pack 8,000 VMs onto a server, achieving millisecond boot times that matched container performance while retaining VM-grade isolation.

The trouble was that each of those specialised VMs was handcrafted. Every new application meant starting over. So Huici's team created Unikraft, a Linux Foundation open-source project that provides an SDK for building unikernels, stripped-down, single-purpose VMs with only the OS components the application actually needs. The project won the Best Paper Award at EuroSys 2021 and has been under development for seven years, with the core goal of achieving sufficient Linux API compatibility for unmodified applications to run on it.

Getting a fast image was only part of the problem, though. Huici described the moment of naivety when he deployed one of these lean images to AWS, expecting ten-millisecond boot times and getting thirty seconds instead. The image was fast, but everything around it was not: load balancers, proxies, controllers, and the virtual machine monitor. The team had to apply unikernel-style efficiency thinking to the entire stack, rebuilding components with shared-memory communication and stripping out unnecessary layers. They chose Firecracker as the VMM, but rejected both QEMU (too slow) and Intel Cloud Hypervisor (too much CPU jitter at high VM counts).

The real breakthrough was VM snapshots. Instead of cold-booting an application every time, the platform takes a snapshot once the application has finished initialising. From then on, every new instance resumes from that pre-warmed state in milliseconds. This enables stateful scale-to-zero: a VM goes to sleep when idle, consuming no CPU and minimal memory, and wakes up exactly where it left off when a request arrives. Huici showed that for a typical workload where maybe twenty out of a thousand users are active at any moment, the other 980 can be sleeping, turning server density from a hardware constraint into a scheduling problem.

Scaling up required solving a string of Linux kernel headaches: limits on tap devices and bridge ports, TCP lock contention at tens of thousands of VMs, and snapshot storage that grows linearly. The team developed compressed and differential snapshots, bringing storage for a million sleeping VMs to roughly twelve terabytes, feasible with commodity NVMe drives.

In the live demo, Huici spun up scaled-to-zero NGINX instances, ran AI agent sandboxes powered by Claude that resumed from sleep to answer questions in milliseconds, then demonstrated the million-VM box responding on demand. He also showed a virtual kubelet integration that presents microVMs as Kubernetes pods, always reporting "running" to the scheduler while silently sleeping and waking underneath, preserving Kubernetes semantics without sacrificing millisecond performance.

During Q&A, an audience member asked about resource exhaustion when many tenants spike at once. Huici's answer was pragmatic: you dimension for peak concurrency, not total tenant count, and queue what you can't serve. The whole point of scale-to-zero is that you're no longer provisioning for every tenant simultaneously, just for the ones that are actually active. Another asked about credential exfiltration from agent VMs. Huici was blunt, never put good credentials inside the agent. An agent that cracks through the user/kernel boundary only gets root on its own kernel, not anyone else's. But the real defence is architectural: keep secrets in a host-side proxy that injects them into requests with firewall rules, where the agent itself has no access to the proxy or the credentials it holds.

His closing message: speed, scale, and strong isolation used to be a pick-two trade-off in cloud infrastructure. With enough engineering care, he argued, you can have all three.

About the Author

Steef-Jan Wiggers

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter