New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Abel Avram on May 07, 2009
Dryad and DryadLINQ are two Microsoft Research projects which facilitate large data processing on computer clusters or data centers for the C# developer.
Dryad is an infrastructure that runs sequential programs in parallel on a cluster of computers or on a data center. The parallel computation is organized as a directed acyclic graph where the programs are the graph’s vertices while the edges are the communication channels between the programs as shown in the figure below:

The graph simply shows the relationship between programs, where the input comes from and where output is directed to. The graph must be acyclic to avoid program scheduling deadlocks. The Job Manager (JM) holds the graph and schedules the execution of a program when its input channels are ready and when there is a computer available. JM gets an available computer from the Name Server (NS) and schedules the program for running through a daemon (D). The communication channels between programs (vertices) are represented by files, shared memory, or TCP pipes. The graph can be changed dynamically during runtime and is fault tolerant. The entire graph can be run on a single system for debugging purposes. Dryad is already used in production by Microsoft for their AdCenter.
DryadLINQ is “a compiler which translates LINQ programs to distributed computations which can be run on a PC cluster”. This process is achieved through the following:
- C# and LINQ data objects become distributed partitioned files.
- LINQ queries become distributed Dryad jobs.
- C# methods become code running on the vertices of a Dryad job.
DryadLINQ has the following features:
- Declarative programming: computations are expressed in a high-level language similar to SQL
- Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. For exploiting multi-core parallelism on each machine DryadLINQ relies on the PLINQ parallelization framework.
- Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.
- Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.
- Type safety: distributed computations are statically type-checked.
- Automatic serialization: data transport mechanisms automatically handle all .Net object types.
- Job graph optimizations
- static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
- dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.
Useful links: Paper: Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks, Presentation: Dryad, Paper: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, DryadLINQ program examples (PDF).
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
A practical guide to choosing the right agile tools
SOA All-In-One Guide: KPIs & Best Practices, ESB Report
Improve Java Garbage Collection, Runtime Execution, and JVM visibility with Zing
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
No comments
Watch Thread Reply