Software architects from the .NET runtime team recently presented several .NET 5 runtime improvements and how they achieved them.
At .NET Conf 2020, Rich Lander, principal program manager on the .NET team, and Stephan Toub and Jan Kotas, software architects on the .NET team, conducted an online session entitled ".NET 5 Runtime Deep Dive with Rich Lander and the Architects". During this no-slides session, they covered various .NET 5 runtime improvements, including ARM64 support, HTTP/3, and support for single-file applications.
The three described a strategy they use for implementing all these improvements. Typically, when the team adds a new feature to the runtime, they implement it using what they call a "functional" implementation. This implementation is not optimized for performance; however, it allows developers and other .NET teams to start using this new feature and provide feedback. Next, during a future release cycle, the feature implementation gets optimized for performance, possibly via a complete reimplementation using a different strategy catering to the required use cases. This process often results in a notable performance gain.
The tech community positively accepted the talk. Konrad Kokosa, the author of Pro .NET Memory Management and a Microsoft MVP, has created the following mind map, summarizing the discussion.
Source: https://twitter.com/konradkokosa/status/1326635315616952321
The first area of runtime improvement discussed is .NET's ARM64 support. While .NET Core 3 featured support for ARM64, this support was functional. A key driver for ARM64 performance improvements in .NET 5 was introducing support for hardware intrinsics on ARM. An example of these hardware capabilities is SIMD (single instruction, multiple data), a class of instructions that simultaneously perform the same operation on multiple data points, utilizing data-level parallelism. Toub compares the process of using hardware intrinsics to intermediate steps:
You first write your basic "for" loop, and you're not taking advantage of those extra transistors you were talking about. Then you can opt to go a little bit lower down and write a little bit more complicated code and use the "Vector" types that exist where then the jit tries to sort of translate from that into whatever intrinsics are available. That gets you a certain degree further. When you want to hyper optimize for a particular set of available operations, you can then dive even lower and go down to the intrinsics yourself.
Other areas of improvement are HTTP/2 performance improvements and functional support for HTTP/3 over QUIC. Many of the performance improvements in the HTTP/2 implementation are related to the reimplementation from unmanaged C++ code to managed C# code. Lander notes that there "still is this kind of idea that managed languages are not quite up to the task for some of those low-level super performance sensitive components," and Toub explains:
When it came to something that is pure CPU raw computation doing nothing but number crunching, in general, you can still eke out better performance if you really focus on "pedal to the metal" with your C/C++ code. For everything else with networking and I/O, it's a lot of just shuffling bits and bytes around and so that you're not dealing with getting the maximum number of instructions per cycle. You're dealing with copying data from here to there. You're dealing with "how can I express my operations as simply as possible?" and allow more experimentation to be done. That allows your code to iterate faster.
In the same manner, the team currently has several QUIC implementations in flight. One is an unmanaged library that allows things to work and be functional. However, they also have an implementation rewritten in C# that is indeed highly optimized. Toub notes that whenever they use their platform for these implementations, they improve the platform itself. Previously, they greatly enhanced the TCP operations as part of the work on the HTTP stack. Now they find various deficiencies in the UDP API that they will fix for everyone’s benefit.