InfoQ Homepage Articles Performance is a Key .NET Core Feature

Performance is a Key .NET Core Feature

May 06, 2018 13 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Key Takeaways

.NET Core is cross-platform and runs on Windows, Linux, Mac OS X and many more. In comparison to .NET the release cycle is much shorter. Most of .NET Core ships in NuGet packages and can be easily released and upgraded.
The faster release cycle is particularly helpful for performance improvement work, and a great deal of work is going in to improving performance of language constructs such as SortedSet and LINQ’s .ToList() method.
Faster cycles and easier upgrades also bring the opportunity to iterate over new ideas of improving .NET Core performance, by introducing types like System.ValueTuple and Span
These improvements can then be fed back into the full .NET framework once proven.

With the release of .NET Core 2.0, Microsoft has the next major version of the general purpose, modular, cross-platform and open source platform that was initially released in 2016. .NET Core has been created to have many of the APIs that are available in the current release of .NET Framework. It was initially created to allow for the next generation of ASP.NET solutions but now drives and is the basis for many other scenarios including IoT, cloud and next generation mobile solutions. In this series, we will explore some of the benefits .NET Core and how it can benefit not only traditional .NET developers but all technologists that need to bring robust, performant and economical solutions to market.

This InfoQ article is part of the series ".NET Core". You can subscribe to receive notifications via RSS.

Now that .NET Core is on the streets, Microsoft and the open-source community can iterate more quickly over new features and enhancements in the framework. One of the areas of .NET Core that gets continuous attention is performance: .NET Core brings along many optimizations in terms of performance, both in execution speed as well as memory allocation.

In this article, we’ll go over some of these optimizations and how the continuous stream – or Span<T>, more on that later – of performance work, helps us in our lives as developers.

.NET and .NET Core

Before we dive in deeper, let’s first look at the main difference between the full .NET framework (let’s call it .NET for convenience) and .NET Core. To simplify things, let’s assume both frameworks respect the .NET Standard - essentially a spec that defines the base class library baseline for all of .NET. That makes both worlds very similar, except for two main differences:

First, .NET is mostly a Windows thing, where .NET Core is cross-platform and runs on Windows, Linux, Mac OS X and many more. Second, the release cycle is very different. .NET ships as a full framework installer that is system-wide and often part of a Windows installation, making the release cycle longer. For .NET Core, there can be multiple .NET Core installations on one system, and there is no long release cycle: most of .NET Core ships in NuGet packages and can be easily released and upgraded.

The big advantage is that the .NET Core world can iterate faster and try out new concepts in the wild, and eventually feed them back into the full .NET Framework as part of a future .NET Standard.

Very often (but not always), new features in .NET Core are driven by the C# language design. Since the framework can evolve more rapidly, the language can, too. A prime example of both the faster release cycle as well as a performance enhancement is System.ValueTuple. C# 7 and VB.NET 15 introduced “value tuples”, which were easy to add to .NET Core due to the faster release cycles, and were available to full .NET as a NuGet package for full .NET 4.5.2 and earlier, and only became part of the full .NET Framework in .NET 4.7.

Now let’s have a look at a few of these performance and memory improvements that were made.

Performance improvements in .NET Core

One of the advantages of the .NET Core effort is that many things had to be either rebuilt, or ported from the full .NET Framework. Having all of the internals in flux for a while, combined with the fast release cycles, provided an opportunity to make some performance improvements in code that were almost considered to be “don’t touch, it just works!” before.

Let’s start with SortedSet<T> and its Min and Max implementations. A SortedSet<T> is a collection of objects that is maintained in a sorted order, by leveraging a self-balancing tree structure. Before, getting the Min or Max object from that set required traversing the tree down (or up), calling a delegate for every element and setting the return value as the minimum or maximum to the current element, eventually reaching the top or bottom of the tree. Calling that delegate and passing around objects meant there was quite some overhead involved. Until one developer saw the tree for what is was and removed the unneeded delegate call as it provided no value. His own benchmarks show a 30%-50% performance gain.

Another nice example is found in LINQ, more specifically in the commonly used .ToList() method. Most LINQ methods operate as extension methods on top of an IEnumerable<T> to provide querying, sorting and methods like .ToList(). By doing this off an IEnumerable<T>, we don’t have to care about the implementation of the underlying IEnumerable<T>, as long as we can iterate over it.

A downside is that when calling .ToList(), we have no idea of the size of the list to create and just enumerate all objects in the enumerable, doubling the size of the list we’re about to return whenever capacity is reached. That’s slightly insane as it potentially wastes memory (and CPU cycles). So, a change was made to create a list or array with a known size if the underlying IEnumerable<T> is in fact a List<T> or Array<T> with a known size. Benchmarks from the .NET team show a ~4x increase in throughput for these.

When looking through pull requests in the CoreFX lab repository on GitHub, we can see tons of performance improvements that have been made, both by Microsoft and the community. Since .NET Core is open source and you can provide performance fixes too. Most of these are just that: fixes to existing classes in .NET. But there is more: .NET Core also introduces several new concepts around performance and memory that go beyond just fixing these existing classes. Let’s look at those for the remainder of this article.

Reducing allocations with System.ValueTuple

Imagine we want to return more than one value from a method. Previously, we’d have to either resort to using out parameters, which are not very pleasant to work with and not supported when writing async methods. The other option was to use System.Tuple<T> as a return type, but this allocates an object and has rather unpleasant property names to work with (Item1, Item2, …). A third option would be to use specific types or anonymous types, but that introduces overhead when writing the code as we’d need the type to be defined, and it also makes unnecessary allocations in memory if all we need is a value embedded in that object.

Meet tuple return types, backed by System.ValueTuple. Both C# 7 and VB.NET 15 added a language feature to return multiple values from a method. Here’s a before and after:

// Before:
private Tuple<string, int> GetNameAndAge()
{
    return new Tuple<string, int>("Maarten", 33);
}

// After:
private (string, int) GetNameAndAge()
{
    return ("Maarten", 33);
}

In the first case, we are allocating a Tuple<string, int>. While in this example the effect will be negligible, the allocation is done on the managed heap and at some point, the Garbage Collector (GC) will have to clean it up. In the second case, the compiler-generated code uses the ValueTuple<string, int> type which in itself is a struct and is created on the stack – giving us access to the two values we want to work with while making sure no GC has to be done on the containing data structure.

The difference also becomes visible if we use ReSharper’s Intermediate Language (IL) viewer to look at the code the compiler generates in the above examples. Here are just the two method signatures:

// Before:
.method private hidebysig instance class [System.Runtime]System.Tuple`2<string, int32>     GetNameAndAge() cil managed 
{
  // ...
}

// After:
.method private hidebysig instance valuetype [System.Runtime]System.ValueTuple`2<string, int32> GetNameAndAge() cil managed 
{
  // ...
}

We can clearly see the first example returns an instance of a class and the second example returns an instance of a value type. The class is allocated in the managed heap (tracked and managed by the CLR and subject to garbage collection, mutable), whereas the value type is allocated on the stack (fast and less overhead, immutable). Or in short: System.ValueTuple itself is not tracked by the CLR and merely serves as a simple container for the embedded values we care about.

Note that next to their optimized memory usage, features like tuple deconstruction are quite pleasant side effects of making this part of the language as well as the framework.

Allocationless substrings with Span<T>

We already touched on stack vs. managed heap in the previous section. Most .NET developers use just the managed heap, but .NET has three types of memory we can use, depending on the situation:

Stack memory – the memory space in which we typically allocate value types like int, double, bool, … It’s very fast (very often lives in the CPU’s cache), but limited in size (typically < 1 MB). The adventurous use the stackalloc keyword to add custom objects but know they are on dangerous territory as a StackOverflowException can occur at any time and crash our entire application.
Unmanaged memory – the memory space where there is no garbage collector and we have to reserve and free memory ourselves, using methods like Marshal.AllocHGlobal and Marshal.FreeHGlobal.
Managed memory / managed heap – the memory space where the garbage collector frees up memory that is no longer in use and where most of us live their happy programmer life with few memory issues.

All have their own advantages and disadvantages, and have specific use cases. But what if we want to write a library that works with all of these memory types? We’d have to provide methods for each of them separately. One that takes a managed object, another one that takes a pointer to an object on the stack or in the unmanaged heap. A good example would be in creating a substring of a string. We would need a method that takes a System.String and returns a new System.String that represents the substring to handle the managed version. The unmanaged/stack version would take a char* (yes, a pointer!) and the length of the string, and would return similar pointers to the result. Unmanageable…

The System.Memory NuGet package (currently still in preview) introduces a new Span<T> construct. It’s a value type (so not tracked by the garbage collector) that tries to unify access to any underlying memory type. It provides a few methods, but in essence it holds:

A reference to T
An optional start index
An optional length
Some utility functions to grab a slice of the Span<T>, copy the contents, …

Think of it as this (pseudo-code):

public struct Span<T>
{
    ref T _reference;
    int _length;
    public ref T this[int index] { get {...} }
}

No matter if we are creating a Span<T> using a string, a char[] or even an unmanaged char*, the Span<T> object provides us with the same functions, such as returning an element at index. Think of it as being a T[], where T can be any type of memory. If we wanted to write a Substring() method that handles all types of memory, all we have to care about is working with a Span<char> (or its immutable version, ReadOnlySpan<T>):

ReadOnlySpan<char> Substring(ReadOnlySpan<char> source, int startIndex, int length);

The source argument here can be a span that is based on a System.String, or on an unmanaged char* – we don’t have to care.

But let’s forget about the memory-type agnostic aspect of Span<T> for a bit and focus on performance. If we’d write a Substring() method for System.String, this is probably what we would come up with:

string Substring(string source, int startIndex, int length)

string Substring(string source, int startIndex, int length)
{
    var result = new char[length];

    for (var i = 0; i < length; i++)
    {
        result[i] = source[startIndex + i];
    }

    return new string(result);
}

That’s great, but we are in fact creating a copy of the substring. If we call Substring(“Hello World!”, 0, 5), we’d have two strings in memory: “Hello World” and “Hello”, potentially wasting memory space, and our code still has to copy data from one array to another to make this happen, consuming CPU cycles. Our implementation is not bad, but it is not ideal either.

Imagine implementing a web framework, and having to use the above code to grab the request body from an incoming HTTP request that has headers and a body. We’d have to allocate big chunks of memory that have duplicate data: one that has the entire incoming request and the substring that holds just the body. And then there’s the overhead of having to copy data from the original string into our substring.

Now let’s rewrite that using (ReadOnly)Span<T>:

static ReadOnlySpan<char> Substring(ReadOnlySpan<char> source, int startIndex, int length)
{
    return source.Slice(startIndex, length);
}

Ok, that is shorter, but there is more. Due to the way Span<T> is implemented, our method does not return a copy of the source data, instead it returns a Span<T> that refers to a subset of our source. Or in the example of splitting an HTTP request into headers and body: we’d have three Span<T>: the incoming HTTP request, one Span<T> pointing to the original data’s header part, and another Span<T> pointing to the request body. The data would be in memory only once (the data from which the first Span<T> is created), all else would just point to slices of the original. No duplicate data, no overhead in copying and duplicating data.

Conclusion

With .NET Core and its faster release cycle, Microsoft and the open-source community around .NET Core can iterate faster on new features related to performance. We have seen a lot of work went into improving existing code and constructs in the framework, such as improving LINQ’s .ToList() method.

Faster cycles and easier upgrades also bring the opportunity to iterate over new ideas of improving .NET Core performance, by introducing types like System.ValueTuple and Span<T> that make it more natural for .NET developers to use the different types of memory we have available in the runtime, while at the same time avoiding the common pitfalls related to them.

Imagine if some .NET base classes were reworked to a Span<T> implementation. Things like string UTF parsing, crypto operations, web parsing and other typical CPU and memory consuming tasks. That would bring great improvements to the framework, and all of us .NET developers would benefit. Turns out that is precisely what Microsoft is planning to do! .NET Core’s performance future is bright!

About the Author

Maarten Balliauw loves building web and cloud apps. His main interests are in ASP.NET MVC, C#, Microsoft Azure, PHP and application performance. He co-founded MyGet and is Developer Advocate at JetBrains. He's an ASP Insider and MVP for Microsoft Azure. Maarten is a frequent speaker at various national and international events and organizes Azure User Group events in Belgium. In his free time, he brews his own beer. Maarten's blog.

This InfoQ article is part of the series ".NET Core". You can subscribe to receive notifications via RSS.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Performance is a Key .NET Core Feature

Write for InfoQ

Key Takeaways

Related Sponsored Content

.NET and .NET Core

Performance improvements in .NET Core

Reducing allocations with System.ValueTuple

Allocationless substrings with Span<T>

Conclusion

About the Author

Rate this Article

This content is in the .NET topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter