Starfish Brings Google-Style Distributed Processing to Ruby

Google developed a tool called MapReduce in C++ to make it easy to perform parallel computations over large datasets using many machines over a network. The system works with tasks that are easily turned into an algorithm where a certain step is repeated over all elements of an array or dataset. This step can then be performed in parallels by multiple machines before being combined down to a useful conclusion at the end. Google uses this tool to collate and analyze large sets of data (such as Web page contents) rapidly with their server farm.

Lucas Carlson has taken the MapReduce concept and brought it to Ruby. Recently, Lucas made a presentation called "Ridiculously easy ways to distribute processor intensive tasks using Rinda and DRb" where he looked at using Ruby's DRb system to get separate machines to run tasks such as log processing or database management tasks. By taking on the concepts from MapReduce, however, Lucas has taken it a step further and made it possible to replace standard map calls with calls that can processed in parallel with a new library called Starfish.

You can learn more and see some examples in Lucas's article about Starfish.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Performance & Scalability topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter