InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

DryadLINQ, Distributed Computing Made Easy

Posted by Abel Avram on May 07, 2009

Sections
Operations & Infrastructure,
Enterprise Architecture,
Development,
Architecture & Design
Topics
Grid Computing ,
.NET ,
Architecture ,
Research
Tags
Microsoft

Dryad and DryadLINQ are two Microsoft Research projects which facilitate large data processing on computer clusters or data centers for the C# developer.

Dryad is an infrastructure that runs sequential programs in parallel on a cluster of computers or on a data center. The parallel computation is organized as a directed acyclic graph where the programs are the graph’s vertices while the edges are the communication channels between the programs as shown in the figure below:

dryad

The graph simply shows the relationship between programs, where the input comes from and where output is directed to. The graph must be acyclic to avoid program scheduling deadlocks. The Job Manager (JM) holds the graph and schedules the execution of a program when its input channels are ready and when there is a computer available. JM gets an available computer from the Name Server (NS) and schedules the program for running through a daemon (D). The communication channels between programs (vertices) are represented by files, shared memory, or TCP pipes. The graph can be changed dynamically during runtime and is fault tolerant. The entire graph can be run on a single system for debugging purposes. Dryad is already used in production by Microsoft for their AdCenter.

DryadLINQ is “a compiler which translates LINQ programs to distributed computations which can be run on a PC cluster”. This process is achieved through the following:

  • C# and LINQ data objects become distributed partitioned files.
  • LINQ queries become distributed Dryad jobs.
  • C# methods become code running on the vertices of a Dryad job.

DryadLINQ has the following features:

  • Declarative programming: computations are expressed in a high-level language similar to SQL
  • Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. For exploiting multi-core parallelism on each machine DryadLINQ relies on the PLINQ parallelization framework.
  • Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.
  • Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.
  • Type safety: distributed computations are statically type-checked.
  • Automatic serialization: data transport mechanisms automatically handle all .Net object types.
  • Job graph optimizations
    • static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
    • dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.

Useful links: Paper: Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks, Presentation: Dryad, Paper: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, DryadLINQ program examples (PDF).

No comments

Watch Thread Reply

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.