Task Parallel Library Improvements in .NET 4.5

Microsoft has been working on ways to improve the performance of parallel applications in .NET 4.5, specifically those using the Task Parallel Library. Here is a preview of what you can expect to see:

Task, Task<TResult>

At the core of .NET’s parallel programming APIs is the Task object. With such an important class Microsoft took great pains to ensure it is as small as possible. Most of the properties for Task are stored not in the class itself, but rather a secondary object called ContingentProperties. This secondary object is created on an as-needed basis, thus reducing the memory footprint for the most common scenarios.

When .NET 4.0 was released the most common scenario was fork-join style programming such as seen with Parallel.ForEach and Parallel LINQ. With .NET 4.5 and the introduction of async, continuation style programming takes the forefront. Microsoft is so confident that this will be the predominate style that they are moving ContinuationObject into Task and the other fields into ContingentProperties. The end result is faster continuations and a smaller Task object.

Task<TResult> shed some unwanted wait as well. It originally had four fields, but as Joseph E. Hoag explains:

It turned out that with some smart restructuring, only the m_result field was truly necessary. By repurposing fields already on the base Task class, m_valueSelector and m_futureState could be made obsolete, and the information stored by m_resultWasSet could instead be stored in the base type’s aforementioned state flags.

The net result was a 49 to 55% reduction in the time it takes to create a Task<Int32> and a 52% reduction in size.

Task.WaitAll, Task.WaitAny

Imagine waiting for 100,000 tasks at the same time. On an x64 machine that would introduce 12,000,000 bytes of overhead above and beyond the size of the tasks themselves. With .NET 4.5 that overhead has dropped to a mere 64 bytes. WaitAny likewise dropped from 23,200,000 bytes of overhead to 152 bytes.

This dramatic change came about due to a change in how kernel synchronization primitives are used. In previous versions one primitive was needed per task. This has been reduced to one per wait operation, regardless of the number of tasks involved.

ConcurrentDictionary

In .NET only reference types and small value types can be assigned atomically. Larger value types such as Guid require are not read and written atomically. To work around this in .NET 4.0, the node objects used by the ConcurrentDictionary are recreated each time the value associated with a key is changed. In .NET 4.5 new nodes are only created if the values cannot be atomically written.

Another change is the ability to create new locks dynamically. Igor Ostrovsky writes,

In practice, a large number of locks is often desirable for maximum throughput. On the other hand, we don’t want to allocate too many lock objects, especially if the ConcurrentDictionary only ends up storing only a small number of items.

To Improve Performance, Reduce Memory Allocations

Joseph writes,

As can be seen in the results of our benchmarks, there is a direct correlation between the amount of memory allocated in a test and the time taken for that test to complete. When viewed individually, memory allocations are not very expensive. The pain comes when the memory system occasionally cleans up unused memory, and this happens at a rate proportional to the amount of memory being allocated. So the more memory that you allocate, the more frequently the memory is garbage collected, and thus the worse your code’s performance becomes.

One way to reduce memory usage is to avoid using closures. Rather than capturing a local variable inside an anonymous function, one can pass in that information to the Task’s constructor as its “state object”. Starting with .NET 4.5, Task.ContinueWith will also support state objects.

Another technique to reduce memory usage is to cache common used tasks. For example, consider a function that accepts an array and returns a Task<int>. Since the result for the empty array case will always be the same, it would make sense to cache the Task representing the empty array.

The next tip is to avoid unnecessarily “inflating” tasks. A task is inflated when something triggers the creation of its ContingentProperties object. The most common causes for this are:

The Task is created with a CancellationToken
The Task is created from a non-default ExecutionContext
The Task is participating in “structured parallelism” as a parent Task
The Task ends in the Faulted state
The Task is waited on via ((IAsyncResult)Task).AsyncWaitHandle.Wait()

It should be noted that task inflation isn’t necessarily a bad thing. Rather, it is something to be aware of so that one doesn’t do unnecessary things such as pass in a CancellationToken that isn’t ever used.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the Parallel Programming topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter