Evolution in Data Integration From EII to Big Data
Approaches to integrating data are changing with emergence of cloud computing.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Jonathan Allen on Jan 19, 2012
We briefly interviewed Zlatko Michailov, author of the Guide to Implementing Custom TPL Dataflow Blocks.
InfoQ: What types of applications do you think are especially suited for TPL Dataflow? Which do you see as inappropriate?
TPL Dataflow is a stream processing platform. For instance, streams of audio/video frames, streams of price quotes, etc. It is especially useful when the messages come at a high frequency. That’s when you can make a difference between an efficient platform and a non-efficient one.
An additional benefit of using a dataflow platform in general is that the topology of the dataflow network takes part in the processing. Thus the application consists of individual, small yet narrowly focused delegates. That makes the application easier to maintain.
InfoQ: Do you see TPL Dataflow as an advanced technique that is only going to be used by a few? Or do you think it will largely replace working directly with Tasks much like Tasks superseded threads?
Neither. TPL Dataflow doesn’t replace tasks. (I don’t think tasks replaced threads either. Tasks filled in a gap in the concurrent programming space.) TPL Dataflow implements patterns using tasks. While the main pattern is stream processing, each block is very general and could be used for other purposes. For instance, the WriteOnce block was designed to be used as a request-response mechanism – a WriteOnce block is instantiated upon a request and once the response data is written to it, it automatically completes, so that the requestor can continue asynchronously. Another example is ActionBlock in combination with the MaxDegreeOfParallelism option – it could be used as throttling mechanism that prevents more than a given number of processing tasks to execute at the same time. A third example is the BufferBlock in combination with the BoundedCapacity option which throttles the data feed. So I see TPL Dataflow as generally applicable.
InfoQ: For a novice just starting out with TPL Dataflow, what would you say are the most important things for them to learn?
This is purely my opinion - the most important thing is to realize that threads are expensive and the OS should not be pressured to create unnecessary threads. Developers should focus on the dependencies among tasks and should rely on the OS/framework to schedule those tasks.
Specifically about TPL Dataflow, I’d advise developers to experiment with each block individually. Chances are you’ll discover a block implements a pattern you frequently use. If you see a pattern that is close to the one you use but not quite like it, consider encapsulating multiple built-in blocks to make up that pattern. If that still doesn’t do it, you may be able to write a simple synchronous block that will fill in the gap.
InfoQ: Would you recommend mixing TPL Dataflow and Windows Workflow together?
WF’s goal is to enable persistent flows that usually take days or even months to complete. Its focus is on reliability, not on performance. TPL Dataflow targets purely performance. Its goal is to utilize the available hardware cores in the most efficient possible way. So technically you can mix the two technologies. My guess would be you can use TPL Dataflow within a WF step.
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Why NoSQL? A primer on Managing the Transition from RDBMS to NoSQL
Monitor your Production Java App - includes JMX! Low Overhead - Free download
Approaches to integrating data are changing with emergence of cloud computing.
Michele Ide-Smith presents the lessons learned in the process of introducing UX principles and techniques into a large organization through a series of small steps.
Dave Farley and Martin Thompson discuss solutions for doing low-latency high throughput transactions based on the Disruptor concurrency pattern.
Rajneesh Namta shares his thoughts, experiences, and some of the critical lessons learned while implementing software test automation on a recent Agile project.
Dale Schumacher presents several patterns of actor interaction that can be used in collaborative programs written in any language.
Rúnar Bjarnason discusses Scalaz, a Scala library of pure data structures, type classes, highly generalized functions, and concurrency abstractions to perform functional programming in Scala.
One of the main challenges when designing software architecture is considering quality attributes. Not only their design turns out to be difficult, but also the specification of these attributes.
Michael Feathers analyzes real code bases concluding that code is not nearly as beautiful as designers aspire to, discussing the everyday decisions that alter the code bit by bit.
No comments
Watch Thread Reply