Cloudflare recently introduced its Gen 13 servers, marking a shift in how its network handles traffic. Instead of relying on large CPU caches for speed, the company redesigned its software to leverage many more processor cores working in parallel in its latest AMD-based servers.
Highlighting the importance of hardware–software co-design, Cloudflare moved away from relying on very large CPU caches that had compensated for software that did not scale well across many cores. The hardware and software changes enable greater capacity per server and improved performance for edge applications, while enhancing energy efficiency.
According to the specifications, Gen 13 is designed with a 192-core AMD EPYC Turin 9965 processor, 768 GB of DDR5-6400 memory, 24 TB of PCIe 5.0 NVMe storage, and a dual 100 GbE network interface card. The new specs allow Gen 13 servers to handle up to twice as much traffic as the previous Gen 12, which runs on the AMD Genoa-X 9684X, while meeting the same response-time targets. The changes deliver around 60% more capacity per rack without increasing power use, while also increasing available memory, storage, and network bandwidth.
In a separate article, "Inside Gen 13: how we built our most powerful server yet," Syona Sarma, JQ Lau, Ma Xiong, and Victor Hwang explain the engineering choices behind the new platform, discussing the AMD EPYC 9965 server layout and components, along with details on the ideal GB-per-core configuration, thermal efficiency, and the transition to 100 GbE networking. In the post focusing on how the company aligned hardware with its redesigned Rust-based FL2 software stack, they write:
The goal was to support workloads that now scale with parallelism rather than cache, enabling significantly higher request capacity and better performance-per-watt across Cloudflare’s global edge infrastructure.
According to the authors, Cloudflare had previously relied on processors with very large L3 caches to keep latency low, as parts of its software were not fully optimized. When testing newer Turin Dense CPUs, which have about one-third of that cache, latency initially increased by around 50%. By working with AMD to analyze the issue and rewriting key parts of their software, Cloudflare eliminated this latency penalty and unlocked significant gains. The team adds:
FL2's cleaner architecture, with better memory access patterns and less dynamic allocation, might not depend on massive L3 caches the way FL1 did. This gave us an opportunity to use the FL2 transition to prove whether Gen 13's throughput gains could be realized without the latency penalty.
On a popular Hacker News thread, many readers found the architectural shift interesting but questioned how much of the improvement came from the hardware versus the software rewrite; several asked for clearer benchmarks and more technical details, with user gdwatson commenting:
I don’t think they explained how they solved the cache issue except to say they rewrote the software in Rust (...) They talked about Rust's greater memory safety; it would have been nice to know whether there were specific language features that played into the cache difference or whether it just made the authors comfortable using a systems language in this application and that made the difference.
Beyond the core architectural changes, the announcement also introduces PCIe encryption hardware support and improved support for thermally demanding PCIe accelerators.