Dynamic process isolation, a technique developed at Cloudflare to safeguard their systems from Spectre-like attacks, provides effective protection and fully mitigates Spectre attacks between multiple tenants, a Cloudflare-Graz University joint research has recently shown.
The idea behind Dynamic process isolation is easy to understand if you take into account that, as Cloudflare engineer Kenton Varda explains, Spectre attacks try to create "pathological performance scenarios" to take advantage of defects of the processor microarchitecture.
This effects tend to appear in metrics like CPU performance counters, even more so when some partial mitigation has been deployed which forces the attacker to multiply the number of attempts. When such an attack is detected, then, it would be possible to reschedule any affected workers into their process to defeat the attack.
Isolating a worker basically means preventing it from accessing memory outside of its "isolate", a technique that is used in the V8 JavaScript engine. In a multi-tenant scenario, multiple customers share the same process, with each customer using their own isolate to ensure some level of protection.
Now, while processes can be hardened against Spectre attacks at the OS or kernel level, isolates within the same process cannot offer the same level of protection. Unless, that is, a suspicious worker is rescheduled into its own process.
As I described above, we can't do this with every Worker, because the overhead would be too high. But, it's totally fine to process-isolate just a few Workers, defensively. If the Worker is legitimate, it will keep operating just fine, albeit with a little more overhead. [...] Once a Worker is isolated, then we can rely on the operating system's Spectre defenses, just like, for example, most desktop web browsers now do.
Since Cloudflare described this approach one year ago, Dynamic process isolation has been fully deployed in production. To assess the validity of their novel approach, Cloudflare has been working with a team at Technical University Graz to develop a Spectre-like attack and launch it against a system protected using Dynamic process isolation.
We developed a detector based on measuring branch mispredictions. Spectre variant 1 attacks — the fastest and easiest kind of Spectre attack — work by fooling the CPU's branch predictor to trigger speculative code execution. Such an attack, when running in our environment, must trigger repeated mispredictions in a loop, in order to get enough data to apply statistics to overcome the noise floor.
This detection technique achieved a false positive rate of 2.83% leading to a total overhead of 50% on the production environment. An alternative technique based on setting a threshold for the CPU time consumed by workers achieved a false positive rate of 0,61% with a total 2% overhead, which made it a viable solution.
According to the researchers, applying dynamic process isolation to workers surpassing the threshold provided the same guarantee as strict process isolation with better performance.
Dynamic process isolation is a general technique that Cloudflare is enforcing in their systems under a strong condition: only JavaScript/WebAssembly code can be run, due to their relying on V8 isolates. This means, their approach might not be so easy to apply on a different platform. For a better understanding of how dynamic process isolation is deployed at Cloudflare, do not miss the original article by Varda.