BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News On Server-Side Performance, .NET 4.5, and Bing

On Server-Side Performance, .NET 4.5, and Bing

This item in japanese

Bookmarks

With over 33% of the market share for US web searches, the servers that power Bing and Yahoo represent one of the largest .NET 4.5 RC applications in continuous production use. The close work between Microsoft’s Bing and .NET teams have resulted in a set of enhancements that should prove useful to anyone running large scale .NET servers.

In a Channel 9 interview, “.NET GC developer Maoni Stephens, Performance Architect Vance Morrison and Bing front end developer Mukul Sabharwal join us for a conversation about .NET 4.5 in practice.” One of the more interesting highlights is the multi-threaded JIT compiler. One wouldn’t think that JIT is important for a server, as it tends to stay online for long periods of time. But with the ASP.NET framework, techniques such as Ngen don’t work so well. There could be hundreds or even thousands of views written in Razor or Web Forms that NGen simply cannot see. And they all have to be re-compiled each time the server is restarted or the process is recycled.

With .NET 4.5 and multi-threaded JIT, Bing is seeing startup times reduced by half. Much of this gain comes from keeping a list of previously JITed components. This list is used to preemptively JIT code in the background at startup. In order to avoid interfering with threads that are actually trying to execute code, the JITing threads are limited to two or three cores even when many more are available.

A major emphasis for .NET 4.5 is Event Tracing for Windows or ETW. Part of this support comes from the newly introduced EventSource class, which allows .NET applications to create their own ETW events. Another improvement is in the area of stack traces. In prior versions, ETW couldn’t offer accurate stack traces on 64-bit servers for code that wasn’t precompiled (e.g. .NET or JavaScript). With .NET 4.5 and Windows Server 2012 the stack traces are now available without attaching a debugger.

Unfortunately a new wrinkle has been introduced with the rise in asynchronous programming. Under the async model, the thread processing a request isn’t necessarily the thread that created the request. It has a new stack trace that can be quite difficult to pair up with the originator of the resource request. In addition to adding more instrumentation around this area, Microsoft has been using PrefView to prototype visualization tools in the hopes of shedding light on this issue. Eventually Microsoft would like to move support for this into Visual Studio.

The next part of the interview covered the background GC, which will be turned on by default in .NET 4.5 server applications. The usually concerns about pinned memory were touched on. In order to determine how much pinning really costs, there is now an ETW event is raised each time a pinning handle is created or modified.

Pinned objects in .NET are not necessarily expensive. They only cause problems if they happen to be in a block of memory that the GC wants to move. So Microsoft is now recommending that objects that frequently need to be pinned (e.g. async buffers) should be reused. If you do that, and only pin them when necessary, they will eventually find their way into the generation 2 heap with other long-lived objects that won’t need to be moved.

Another option is to simply allocate an unmanaged buffer to give to the OS instead of pinning a managed buffer. The tradeoff here is that you have to pay the price for copying the filled buffer into managed memory every time as opposed to occasionally paying the penalty for running the GC against pinned memory.

Long term Microsoft would like to provide some sort of framework support creating a reusable buffer pool.

Some more notes on .NET’s GC:

  • Under 32-bit operating systems the .NET heap is about 2 GB. With 64-bit operating systems Microsoft is starting to see 10 GB heaps are not uncommon and even have a few customer reports of 50 GB heaps. But if you want individual arrays that are larger than 2 GB you need to turn on the gcAllowVeryLargeObjects setting.
  • In .NET server GC, there is there is one heap per logical processor. The small object heaps are rebalanced as necessary, but prior to .NET 4.5 the large object heap was not.
  • When using a NUMA architecture with multiple CPU groups, the GCCpuGroup setting should be turned on.
  • The GC can be temporarily turned off during performance-sensitive operations using SustainedLowLatency mode.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Isn't that old news?

    by peter lin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    For many years, serious java server side developers have known this. Native code is great, but there's always going to be a need for JIT. This is especially true of large complex applications with lots of lines of code. Once an application gets over 1/4 million lines of code, it is very difficult to optimize it for a wide variety of use cases. Often, what "we" think is the critical code path isn't.

  • Whats Bing?

    by Mac Noodle,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    :)

  • Re: Isn't that old news?

    by Mac Noodle,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Peter, glad you pointed this out. While many people tend to only point out where .NET has excelled as opposed to Java (i.e. C# with things like closure versus Java [the language]), most tend to ignore the lessons that .NET (and those using it) could have learned from Java and didn't. Besides the one you pointed out, there is .NET's "EJB 1/2" - Entity Framework.

  • SustainedLowLatency doesn't turn the GC off

    by Samuel Jack,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    SustainedLowLatency mode doesn't turn the GC off. Instead it minimises the time that managed threads need to be paused for by avoiding Generation 2 collections where ever possible. And it is intended for long-term use in processes that have large amounts of memory available, not as a temporary measure. Details here.

  • Re: Isn't that old news?

    by Faisal Waris,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I may be wrong but I believe that you are missing the main point.

    Native code generation still happens but its not in the path of startup code. Given enough time, all server code (that has not been updated) will be native compiled in the background. The native code persists so that at next startup it does not need to be recompiled if it has not changed.

    This technique has been used on the client side for long time. For example, Visual Studio 2012 startup time on my machine is about 4 seconds (add 2 sec for loading a simple solution). It was much longer when VS2012 was first installed but the background JITing and cacheing has reduced that to a minimum.

  • Really?

    by Cameron Purdy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    With over 33% of the market share for US web searches, the servers that power Bing and Yahoo ..


    Interesting. StatCounter shows a solid 80% share for Google in the US. Are you suggesting that Google uses Yahoo! and Bing to do its searches? Otherwise the numbers don't add up ..

    Now for the real question: What part(s) of Bing and Yahoo! run on the CLR? I didn't realize any of it did, so that would be of interest!

    Peace,

    Cameron Purdy | Oracle
    (Working for Oracle but opinions expressed are my own.)

  • Re: Really?

    by Jonathan Allen,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I've seen a wide range in numbers and just used the first set I saw for June 2012. According to this article, which does put Bing+Yahoo at only 26.6%, Google has a record US marketshare of 66.8%. To be honest though, I seriously question the accuracy of any search data not presented by the companies in question.

    searchengineland.com/google-bing-hit-all-time-h...

    As for a breakdown of CLR use in Bing, I'll see what I can dig up.

  • Re: Isn't that old news? Yes, and unimportant

    by M Vleth,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I fail to see why we should care about startup times for long running server processes. Hence, we don't want to restart that much.

    What is just seriously missing in the CLR is the ability to discard previously compiled (jitted) code with the knowledge of the actual execution path, like most JVMs do. That is what seriously can make a performance difference in long running code.

  • Re: Isn't that old news?

    by peter lin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I got that just fine. My point is that JIT often does a better job of generating optimized code based on actual code path at runtime, rather than compiling ahead of time. Does matter if it's compiled in the background or before the application is deployed. Persisting JIT code is an old technique and JVM/CLR weren't the first to do it either.

    my point is that in many cases, delaying compilation or optimization can improve performance much more. Optimizing the code path that is actually used most frequently based on runtime statistics is very powerful.

  • Re: Isn't that old news?

    by Faisal Waris,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    .Net 4.5 has something called "Profile Guided Optimization". My understanding is that the native code compiler can generate optimized code based on data from runtime code analysis.

    I wonder, the hotspot VM was first created for Smalltalk - a dynamic language. For a statically typed language such as Java the optimization needs to be quite different because much more is known as compile time. Sun must have reworked it extensively then.

  • Re: Isn't that old news?

    by peter lin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Dr. Clif Click has talked about this in several talks over the last 5 years. Just search on google tech talks and infoq talks. Having worked on a high performance RETE engine the last 8 years, doing things lazily like JIT are huge wins in terms of performance. We can see parallels in OLAP servers the last decade. OLAP cubes used to be pre-calculated leading to database explosione. All of the major OLAP products moved to lazy calculation + bitmap + caching measures to improve performance. To me, lazily loading + compiling on demand makes a lot of sense. The only downside of profile guided optimization that I can see is usage pattern changes over time. It means you'd have to recompile with the new profile. The type of JIT in the JVM is very powerful and has many years of real world testing. I hope .Net moves towards that model over time.

  • Re: Isn't that old news?

    by Faisal Waris,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Ok I did read Dr. Clicks's blog posts on various topics and I am not entirely convinced. His arguments (maybe they are outdated) especially for C++ vs. Java performance are not validated by independent tests.

    Firstly, Google did a famous benchmark study between C++, Java, Scala and Go and C++ was way ahead of the JVM languages (for both versions of the algorithm implementation). Note a study like Google's is rare and valuable because the same algorithm was implemented by respective experts in the lanuages.

    Secondly, I personally ported the Scala code to F# and optimized it using the VS Profiler. The F# and Scala performance are not directly comparable because different hardware was used. However, the comparion between C++ vs. Scala on the same machine and C++ vs. F# also on the same machine are comparable.

    Pentium machine:
    C++ (original algorithm): 23 sec
    C++ (updated algorithm) : 5 sec
    Scala (updated algorithm): 58 sec
    Java (updated algorithm): 89 sec


    Core i7 machine:
    C++ (original algorithm): 23.5 (this was not as optimized as the google implementation)
    F# (updated algorithm): 18.4 sec

    I recently ran the F# code again on a Core i5 with Windows 8 and .Net 4.5 and consistently get less than 12 seconds.

    Also note that the F# time is for the complete .EXE, start to finish (inclues the time for the OS to launch the process). Scala time is only calculated after a few runs have been made to warm the JIT caches.

    More info (including code) is here: fwaris.wordpress.com/2011/07/11/googles-multi-l....

  • Re: Isn't that old news?

    by Cameron Purdy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Cliff Click's arguments are validated by many independent tests and experts.

    Look, in the field of benchmarking, you can easily show just about whatever it is that you want to show, even that a language like F# is fast. ;-)

    But let's stop pretending that C++ is even in the picture. In the real world, with multi-threaded code on complex multi-core processors and multi-processor servers, C++ is a very expensive choice to optimize (let alone stabilize), so even in the few cases in which it is demonstrably faster for server workloads, it would still be a poor practical choice.

    The future of software became pretty obvious in the late 90s when Hotspot first emerged, and dynamic runtime compilation showed much greater promise compared to static compilation and linking (even "dynamic" linking, which only links to the boundary of a module). There's still plenty of optimization potential left in this area, which means we'll continue to see significant gains in JVM and CLR based languages for a decade or more to come.

    Peace,

    Cameron Purdy | Oracle
    (Employed by Oracle, but the opinions are mine alone.)

  • Re: Isn't that old news?

    by Faisal Waris,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Cameron,

    We all recognize the contributions that Java/JVM/CLR have made and will make to the world of computing but C++ is coming back in the picture and it should be duly noted.

    The new C++ ’11 standard makes C++ almost a new language. C++ now has lambdas and closures (we are still waiting, for Java) and it now has a C#-like ‘feel’ to it. There is still a dearth of C++ libraries/frameworks when compared to Java or .Net but that’s being addressed. Consider the AMP library for parallel computing (blogs.msdn.com/b/vcblog/archive/2011/06/15/intr...) as an example.

    A big reason ‘C’ style languages (Objective-C, C++) are coming back is because of client-side computing or ‘app’s (IOS, WinRT) – to support immediacy of response, lower memory footprint, and lower power consumption.

    Also, I understand the point about benchmarking however we should not dismiss the Google study so quickly. Google’s was an earnest attempt to understand many parameters of programming languages, including performance (but not exclusively that). Google picked a benchmark problem first - which they felt offered a good variety of workloads - and then asked experts in each language to implement the algorithm. It is a rare and a valuable study (even Dr. Click mentions that a study like this is needed, in one of his blog posts).

    A difference of 5 seconds for C++ and 89 seconds for Java is huge and is at odds with Dr. Clicks claims about the JVM.

    The F#/.Net port was my earnest attempt to see how they fared on the benchmark. F# code is almost a straight port of Scala’s (the language closest to F#), anyone can verify that. The results are what they are. And if C++ is faster than F# then I am OK with it.

    Regards,

    Faisal

  • Re: Isn't that old news?

    by Cameron Purdy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    A big reason ‘C’ style languages (Objective-C, C++) are coming back is because of client-side computing or ‘app’s (IOS, WinRT) – to support immediacy of response, lower memory footprint, and lower power consumption.


    Yes, this is a very reasonable point. Java (the JVM) in particular, and the CLR-based languages in a large way are optimal for server side application development; there is very little C/C++ in that area. However, C/C++ have some very distinct advantages (memory footprint, start-up time, native integration with OS) that Java/JVM is very poor at, and the CLR is only slightly better at (with excellent Windows integration).

    In general though, I am very keen to help developers avoid the pain of C++. Yes, it's just a tool, but it's got to be the ugliest golden hammer in the history of computer science. ;-)

    Peace,

    Cameron Purdy | Oracle
    (Employed by Oracle, but these opinions are my own.)

  • Re: Isn't that old news?

    by peter lin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think we can all agree that C/C++ in expert hands is a scalpel. The difficulty is finding developers that have the skills and experience to write top notch code. Even then for business applications I wouldn't choose C++ unless latency and memory requirements were absolutely critical. Things like high frequency trading, military command control and hard real-time systems need absolutely predictable behavior, so those cases C++ is the only way to go. For most business applications that run on a server and the code base has millions of lines of code, JIT will likely provide better optimization with less manual tweaking.

    Bad C++ will seg fault and blow up hard, whereas bad Java code will still run. JIT does wonders to bad java code. Just to be clear, I'm not recommending developers write crappy code. Even good developers write crappy code at times.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT