New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Jonathan Allen on Apr 21, 2009
In an essay on Cilk Arts, Professor Guy Blelloch argues that parallel programming is not intrinsically hard, but rather a question of abstraction. The three problems identified by Blelloch are a lack of training in parallel thinking, separating parallel implementation from algorithms, and determinism. After detailing each, he explains why he thinks they can be overcome.
In the question of “Parallel Thinking”, he puts forth the Quicksort algorithm. Quicksort is intrinsically parallel, both in its recursive call and in comparing the pivot element to each element in the list.
Yet most of our classes teach Quicksort without pointing out it is parallel. We carefully go through the sequential execution and analysis and argue it takes O(n log n) expected time. Most students are left clueless of the amount of parallelism in the algorithm, and certainly on how to analyze the parallelism. For example, what if one only took advantage of one or the other form of parallelism, how much is left.
Blelloch believes that since the algorithms are already at the correct level of abstraction, it is just a matter of training programmers to think in those terms.
The second issue he addresses is the separation of low-level and high-level details. Forty years ago, most algorithm books would include notes on memory allocation, simulating recursion with stacks, and a non-trivial amount of assembly. These days, programmers don’t think twice about recursion and most don’t even think about memory management. Yet many parallel libraries still require developers to “specify stack sizes for forked tasks, implement their own schedulers or specify scheduling policies, or understand mapping of memory to processors.”
Blelloch promotes Cilk++ as a good example of “separating the wheat from the chaff”. Of course other initiatives are also going in that direction such as Microsoft’s CCR/DSS, Parallel LINQ, Java’s fork-join framework and specialized languages like Erlang.
The third issue is that of determinism, also known as concurrent programming. Blelloch argues that non-deterministic code should be very rare, and where it exists, it should be hidden inside a component that is externally deterministic.
Unfortunately, this is where his argument falls flat. Systems that deal with external connections are inherently non-deterministic, both in sequence and in behavior. No amount of abstraction is going to change the fact that you cannot predict when, if at all, another player in a video game is going to move his character. Nor can you predict if a trader at your various partner companies is going to hit or pass on a particular stock offer.
Still, two things can be taken away from this essay. Schools really need to spend more time teaching students to think in terms of parallel algorithms and the next generation of libraries are showing great promise for parallel, though not concurrent, systems.
Improve Java Garbage Collection, Runtime Execution, and JVM visibility with Zing
Why NoSQL? A primer on Managing the Transition from RDBMS to NoSQL
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Monitor your Production Java App - includes JMX! Low Overhead - Free download
Getting Started with Stratos - an Open Source Cloud Platform
Members of academia sometimes seem tempted to claim that something considered difficult is in fact easy. Not long ago Peter Van Roy argued that shared-memory concurrency is a solved problem and the solution is dataflow programming; see www.info.ucl.ac.be/~pvr/vanroy-mc-panel.pdf. An interesting discussion that ensued on LtU shows it is a controversial claim: lambda-the-ultimate.org/node/2874.
For the sake of making a point, short essays like Blelloch's or Van Roy's do tend to oversimplify matters and gloss over some issues that would be addressed in a proper scientific paper. I suspect that "determinism" remark would require some more clarification.
When speaking of determinism in a parallel program, one can assume it is implied "given the same inputs." Sources of non-determinism in parallel programs most often come from race conditions, where a synchronization is missing and the program becomes potentially dependent on things other than the inputs the programmer intended. A non-deterministic program can be incredibly difficult to debug, since it is relatively unpredictable.
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
2 comments
Watch Thread Reply