Asynchronous Message Processing using Task Parallel Library and Reactive Extensions
Since we last reported on TPL Dataflow more information has come to light. First and foremost is the goals of the project have been clarified. At a high level the primary purpose is to support high performance producer/consumer scenarios using asynchronous programming techniques. These can be used alone or in conjunction with the task, query, and loop-based parallelism offered by the TPL libraries.
Just as important is what this is not. According to Stephen Toub, this is not a direct replacement for full actor/agent style languages and libraries such as Erlang. TPL Dataflow is just a set of building blocks; developers still have to design the infrastructure around it. (There is a separate research project called Axum that is attempting to fill that role.)
At first glance TPL Dataflow may seem to overlap with Reactive Extensions. While they both involve moving data around, Reactive Extensions focuses on the ability to write complicated push-based data streams in a very succinct fashion. TPL Dataflow is more about being the fundamental building blocks for building up actors and agents, with an emphasis on controlling aspects such as where to do buffering and when to block producers.
These libraries are meant to be complementary, with the April CTP of TPL Dataflow offering direct integration that adds “the ability to expose dataflow sources as observables and dataflow targets as observers.” The intention is that developers can seamlessly move messages back and forth between dataflow and reactive code as necessary. It should be noted that it is possible to do everything in one or the other, but each offers some useful capabilities that would otherwise have to be built up from scratch.
Stephen also mentioned that a lot of TPL Dataflow has an analogous framework on the native-side called Asynchronous Agents Library. Aside from slightly different naming conventions, TLP Dataflow tends to offer a richer API than its unmanaged counterpart. For example there is built-in support for telling a block that it won’t be receiving any more data and it can shut itself down. TPL Dataflow also has the advantage in that the C# and VB languages are being modified to better support it, something that isn’t feasible for C++.
Due to customer feedback, a major emphasis with TPL Dataflow is the reducing the number of object allocations needed for processing messages. While actually allocating memory is very cheap in .NET, creating too many objects can result in a significant garbage collection cost down the road. Some strategies such as reusing active tasks have been supported all along. With the newest CTP further enhancements such as replacing the DataflowMessage<T> class with the DataflowMessageHeader struct. Another improvement is making the cloning function of WriteOnceBlock<T> and BroadcastBlock<T> optional, allowing more efficient use of immutable messages.
TPL Dataflow can be downloaded as part of the Visual Studio Async CTP. There is no timeline for its release, but the heavy reliance on the new syntax from VB 11 and C# 5 imply that it will be shipped when those are.
Ronny Kohavi Dec 12, 2013
Christian Legnitto Dec 12, 2013