Evolution in Data Integration From EII to Big Data
Approaches to integrating data are changing with emergence of cloud computing.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Jonathan Allen on Jan 02, 2012
Dataflow Blocks are the backbone of the .NET 4.5’s new high performance parallel processing library. And while they offer a lot of functionality out of the box, there will be times when a custom block is necessary. Zlatko Michailov has put together a document outlining the process and many of the traps you may encounter. The full guide, Guide to Implementing Custom TPL Dataflow Blocks, is available on the Parallel Programming with .NET blog so we’ll just hit the highlights.
Before you begin, Zlatko asks you to consider if you just glue together an existing ITargetBlock and ISourceBlock. If so, the Encapsulate function will create a new IPropagatorBlock for you. This function handles most of the boilerplate code, but you still have to explicitly state how messages are propagated from the target to the source block.
For more control you can explicitly implement ITargetBlock and ISourceBlock. Unlike most abstract interfaces, not all methods are meant to be exposed on the implementing class’s public interface. Some methods such as LinkTo and Complete are meant to be called by general code, while others such as OfferMessage should only be called via the abstract interface from another block. Section 5.1 of the guide has the recommended visibility rules for each.
Zlatko then shows two detailed examples. The first is a synchronous filter block, the second a synchronous transformation block. While there is quite a bit of boilerplate code, it does illustrate a lot about how TPL Dataflow works internally.
The truly tricky code comes into play when Zlatko starts talking about the asynchronous block. Right from the beginning you have to start considering things such as a lock hierarchy. Zlatko recommends an approach used by the built-in blocks which involves an outgoing, an incoming, and a value lock.
Marking a block as completed seems like a simple thing; one simply needs to set the Completion property when Complete or Fault is invoked. But once you start working with asynchronous blocks even this can be tricky. For example, actually setting the property has to be done without holding a lock because it may trigger other synchronous code.
Another concern is whether to consume messages in a greedy or non-greedy fashion. If using a non-greedy block then additional steps must be taken to avoid collisions with other targets listening to the same source.
Finally Zlatko covers offering messages and linking targets. Fortunately this is much easier because all of the operations are intended to be synchronous.
For more information see the TPL Dataflow site on DevLabs.
InfoQ is currently looking for writers for our educational content section. If you know your way around TPL Dataflow and would be interested in writing a 4 to 6 page article on the subject contact Jonathan Allen at jonathan@infoq.com.
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Why NoSQL? A primer on Managing the Transition from RDBMS to NoSQL
Monitor your Production Java App - includes JMX! Low Overhead - Free download
Approaches to integrating data are changing with emergence of cloud computing.
Michele Ide-Smith presents the lessons learned in the process of introducing UX principles and techniques into a large organization through a series of small steps.
Dave Farley and Martin Thompson discuss solutions for doing low-latency high throughput transactions based on the Disruptor concurrency pattern.
Rajneesh Namta shares his thoughts, experiences, and some of the critical lessons learned while implementing software test automation on a recent Agile project.
Dale Schumacher presents several patterns of actor interaction that can be used in collaborative programs written in any language.
Rúnar Bjarnason discusses Scalaz, a Scala library of pure data structures, type classes, highly generalized functions, and concurrency abstractions to perform functional programming in Scala.
One of the main challenges when designing software architecture is considering quality attributes. Not only their design turns out to be difficult, but also the specification of these attributes.
Michael Feathers analyzes real code bases concluding that code is not nearly as beautiful as designers aspire to, discussing the everyday decisions that alter the code bit by bit.
No comments
Watch Thread Reply