On Server-Side Performance, .NET 4.5, and Bing

With over 33% of the market share for US web searches, the servers that power Bing and Yahoo represent one of the largest .NET 4.5 RC applications in continuous production use. The close work between Microsoft’s Bing and .NET teams have resulted in a set of enhancements that should prove useful to anyone running large scale .NET servers.

In a Channel 9 interview, “.NET GC developer Maoni Stephens, Performance Architect Vance Morrison and Bing front end developer Mukul Sabharwal join us for a conversation about .NET 4.5 in practice.” One of the more interesting highlights is the multi-threaded JIT compiler. One wouldn’t think that JIT is important for a server, as it tends to stay online for long periods of time. But with the ASP.NET framework, techniques such as Ngen don’t work so well. There could be hundreds or even thousands of views written in Razor or Web Forms that NGen simply cannot see. And they all have to be re-compiled each time the server is restarted or the process is recycled.

With .NET 4.5 and multi-threaded JIT, Bing is seeing startup times reduced by half. Much of this gain comes from keeping a list of previously JITed components. This list is used to preemptively JIT code in the background at startup. In order to avoid interfering with threads that are actually trying to execute code, the JITing threads are limited to two or three cores even when many more are available.

A major emphasis for .NET 4.5 is Event Tracing for Windows or ETW. Part of this support comes from the newly introduced EventSource class, which allows .NET applications to create their own ETW events. Another improvement is in the area of stack traces. In prior versions, ETW couldn’t offer accurate stack traces on 64-bit servers for code that wasn’t precompiled (e.g. .NET or JavaScript). With .NET 4.5 and Windows Server 2012 the stack traces are now available without attaching a debugger.

Unfortunately a new wrinkle has been introduced with the rise in asynchronous programming. Under the async model, the thread processing a request isn’t necessarily the thread that created the request. It has a new stack trace that can be quite difficult to pair up with the originator of the resource request. In addition to adding more instrumentation around this area, Microsoft has been using PrefView to prototype visualization tools in the hopes of shedding light on this issue. Eventually Microsoft would like to move support for this into Visual Studio.

The next part of the interview covered the background GC, which will be turned on by default in .NET 4.5 server applications. The usually concerns about pinned memory were touched on. In order to determine how much pinning really costs, there is now an ETW event is raised each time a pinning handle is created or modified.

Pinned objects in .NET are not necessarily expensive. They only cause problems if they happen to be in a block of memory that the GC wants to move. So Microsoft is now recommending that objects that frequently need to be pinned (e.g. async buffers) should be reused. If you do that, and only pin them when necessary, they will eventually find their way into the generation 2 heap with other long-lived objects that won’t need to be moved.

Another option is to simply allocate an unmanaged buffer to give to the OS instead of pinning a managed buffer. The tradeoff here is that you have to pay the price for copying the filled buffer into managed memory every time as opposed to occasionally paying the penalty for running the GC against pinned memory.

Long term Microsoft would like to provide some sort of framework support creating a reusable buffer pool.

Some more notes on .NET’s GC:

Under 32-bit operating systems the .NET heap is about 2 GB. With 64-bit operating systems Microsoft is starting to see 10 GB heaps are not uncommon and even have a few customer reports of 50 GB heaps. But if you want individual arrays that are larger than 2 GB you need to turn on the gcAllowVeryLargeObjects setting.
In .NET server GC, there is there is one heap per logical processor. The small object heaps are rebalanced as necessary, but prior to .NET 4.5 the large object heap was not.
When using a NUMA architecture with multiple CPU groups, the GCCpuGroup setting should be turned on.
The GC can be temporarily turned off during performance-sensitive operations using SustainedLowLatency mode.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Performance & Scalability topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter