In Case You Missed It: JIT Enhancements in .NET 3.5 SP1
In the.NET platform, most compiler optimizations are not performed by the VB and C# compilers. Rather, they are delayed until the CLR’s Just in Time compiler takes the IL and coverts it into native machine code. Because of this, changes to the JIT can have a significant impact on previously compiled assemblies.
One area that has a major impact is inlining function calls. Previously the JIT was very conservative with inlining methods, Vance Morrison explains why,
It is not always better to inline. Inlining always reduces the number of instructions executed (since at a minimum the call and return instructions are not executed), but it can (and often does), make the resulting code bigger. Most of us would intuitively know that it does not make sense to inline large methods (say 1Kbytes), and that inlining very small methods that make the call site smaller (because a call instruction is 5 bytes), are always a win, but what about the methods in between?
Interestingly, as you make code bigger, you make it slower, because inherently, memory is slow, and the bigger your code, the more likely it is not in the fastest CPU cache (called L1), in which case the processor stalls 3-10 cycles until it can be fetched from another cache (called L2), and if not there, in main memory (taking 10+ cycles). For code that executes in tight loops, this effect is not problematic because all the code will ‘fit’ in the fastest cache (typically 64K), however for ‘typical’ code, which executes a lot of code from a lot of methods, the ‘bigger is slower’ effect is very pronounced. Bigger code also means bigger disk I/O to get the code off the disk at startup time, which means that your application starts slower.
For service pack 1, Microsoft has introduced a new heuristic based on code size and whether or not the call is in a loop. Under normal circumstances a function will only be inlined if the resulting machine code is smaller than the original version at the call site. This is done to ensure that as much code as possible fits in the CPU’s cache, as cache misses can have a significant impact on performance.
A partial exception is when working inside a loop. Because the function will presumably be called a lot more often, the CLR is allowed to inline functions that are up to 5 times larger than the original call site. Other conditions such as value type optimizations may further increase the allowed size.
So what is the final bottom line?
Mike Amundsen May 29, 2015
Ben Linders May 28, 2015