InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

TPL Dataflow – The Successor to CCR

Posted by Jonathan Allen on Jan 06, 2011

Sections
Architecture & Design,
Development
Topics
.NET ,
Languages ,
Concurrency ,
Parallel Programming ,
Programming ,
Task Parallel Library ,
TPL Dataflow ,
CCR

TPL Dataflow is Microsoft’s new library for highly concurrent applications. Using asynchronous message passing and pipelining, it promises to offer more control than thread pools and better performance than manual threading. The downside is that you have to adhere to design patterns that may be unfamiliar to .NET programmers.

A dataflow consists of a series of “blocks”. Each block, known as a port in the older CCR, can be a source and/or target for data. Data usually enters into a dataflow by being posted to a propagation block, which is block that implements both ISourceBlock<T> and ITargetBlock<T>. Since this block is a source, it may be linked to other target or propagation blocks. Data flows from one block to the next in an asynchronous fashion, often being buffered at the source or target until such time as it is needed.

As the name implies, the infrastructure beneath TPL Dataflow is .NET 4’s Task Parallel Library. And like the TPL you can replace the default scheduler with a custom implementation. Out of the box you one based on .NET’s thread pool system and one that uses the synchronization context framework. The latter would be used when you want the dataflow to run on a particular thread, for example when using dataflow to manipulate a GUI. The documentation is unclear on this point, but it appears that you can set the scheduler on a block-by-block basis. If true, this would provide an excellent means for marshaling data back into the GUI thread.

By default a dataflow is tuned for performance over fairness. Implementation wise, this means that once a block is activated it will continue to process data until it runs out. To prevent one block from consuming all available resources one can put a message count limit on it. When set, the block will only process that many pieces of data before terminating the current task and allowing another to take over.

Blocks can consume data on a greedy or non-greedy fashion. The latter is very important for use in situations where multiple targets are vying for messages from the same source. For example, when using multiple targets to load-balance a single source you want to ensure those targets are non-greedy lest everything be dumped into the first greedy block. Another reason to use non-greedy consumers is to allow messages to be dropped when they are replaced by newer versions. An example of this would be the Broadcast block. It offers each message to every one of its target. If the targets don’t accept the block it will only remain available until the next time the Broadcast block receives a message.

Turning to the issue of locking, TPL Dataflow has an interesting design to avoid deadlocks. When a given block needs input from multiple sources, it will wait until all sources have data available. At that point it will use a two-phase commit to ensure it can actually acquire data from each source. It is highly recommended that one relies on this mechanism and not try to manually lock data that is inside the workflow.

This design, especially when combined with blocks like Broadcast, can lead to race conditions. Essentially the problem is that one message be delivered and acted upon by multiple blocks. Since locks can interfere with the scheduler, it is much safer to use immutable data types as messages.

The TPL Dataflow library is available with the Async CTP.

No comments

Watch Thread Reply

Educational Content

Evolution in Data Integration From EII to Big Data

Approaches to integrating data are changing with emergence of cloud computing.

Winning Hearts and Minds: How to Embed UX from Scratch in a Large Organization

Michele Ide-Smith presents the lessons learned in the process of introducing UX principles and techniques into a large organization through a series of small steps.

LMAX Disruptor: 100K TPS at Less than 1ms Latency

Dave Farley and Martin Thompson discuss solutions for doing low-latency high throughput transactions based on the Disruptor concurrency pattern.

Thoughts on Test Automation in Agile

Rajneesh Namta shares his thoughts, experiences, and some of the critical lessons learned while implementing software test automation on a recent Agile project.

Actor Interaction Patterns

Dale Schumacher presents several patterns of actor interaction that can be used in collaborative programs written in any language.

Scalaz: Functional Programming in Scala

Rúnar Bjarnason discusses Scalaz, a Scala library of pure data structures, type classes, highly generalized functions, and concurrency abstractions to perform functional programming in Scala.

Faster, Better, Higher – But How?

One of the main challenges when designing software architecture is considering quality attributes. Not only their design turns out to be difficult, but also the specification of these attributes.

Software Naturalism - Embracing the Real Behind the Ideal

Michael Feathers analyzes real code bases concluding that code is not nearly as beautiful as designers aspire to, discussing the everyday decisions that alter the code bit by bit.