The Cost of Async and Await
Asynchronous techniques can offer significant improvements in an application’s overall throughput, but it isn’t free. An asynchronous function is often slower than its synchronous alternative and unless care is taken it can also add significant memory pressure. Stephen Toub of MSDN Magazine has recently covered this topic in an article titled “Async Performance: Understanding the Costs of Async and Await”.
One of the most significant advantages of managed code over native C++ is the ability to inline functions at runtime. The CLR’s JIT compiler can even inline functions across assemblies, significantly reducing the overhead for the fine-grained methods that OOP programmers prefer. Unfortunately the very nature of an asynchronous call means that delegates cannot be inlined. Furthermore, there is quite a bit of boilerplate code involved in setting up an asynchronous call. This leads to Stephen’s first suggestion, “Think Chuncky, Not Chatty”. Just as if you were crossing a COM or p/invoke boundary, you should favor a few large async calls over lots of small ones.
There are numerous ways in which the asynchronous patterns can allocate memory without the developer explicitly using the new operator. If left unchecked, these allocations can lead to excessive memory pressure and unwanted delays as the garbage collector tries to catch up. Consider this signature and return statement from a subclass of Stream:
public override async Task<int> ReadAsync(…)
Not shown is the implicitly created Task object that is used to wrap the integer being returned by the Read method. In his article, Stephen shows how to reduce the memory overhead by caching the last Task<int> object and reusing it.
Another cause of unexpected object allocation and retention is the use of closures. In C# and VB closures are implemented as anonymous classes that contain the anonymous and asynchronous functions declared in the method. Locals variables needed by those functions are said to “closed over” or “lifted” into the anonymous class. An instance of this class has to be created for each time the parent method is called.
However aggravating extra allocations may be, the problems don’t end there. Normally objects referenced by local variables are eagerly claimed, the GC can collect them as soon as it is clear that they won’t be used again in the current function. Since the “locals” used by an asynchronous functions are actually fields on an anonymous class, they must be retained for the duration of the call. If this takes several seconds, which is not uncommon for an asynchronous call, the anonymous class may be inadvertently promoted to the more expensive generation 1 or 2. If this becomes a problem Stephen recommends explicitly nulling out locals as soon as they are no longer needed.
The third issue that Stephen discusses is the concept of contexts, specifically the synchronization context and the execution context. His article shows how library code can obtain a performance boost by intentionally ignoring the synchronization context via the ConfigureAwait method and avoiding things that require capturing the execution context.