Bindings, Platforms, and Innovation
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
Tracking change and innovation in the enterprise software development community
Posted by Sebastien Auvray on Jan 29, 2008 12:00 PM
The MapReduce design pattern to distribute data processing was introduced by Google in 2004, and came with a C++ implementation. A new Ruby implementation is now available under the name of Skynet released by Adam Pisoni.There are notably 2 key differences between Google's design paper and Skynet:Skynet is an adaptive, self-upgrading, fault-tolerant, and fully distributed system with no single point of failure.
If a worker dies or fails for any reason, another worker will notice and pick up that task. Skynet also has no special ‘master’ servers, only workers which can act as a master for any task at any time. Even these master tasks can fail and will be picked up by other workers.Skynet is very easy to use and set up which is the real strength of the MapReduce concept. Skynet also extends ActiveRecord with MapReduce features such as
distributed_find.> Model.distributed_find(:all, :conditions => "id > 20").each(:somemethod)As long as your running Skynet, it will execute :somemethod on each model, but in a distributed manner (on as many workers as you have). It does this without instantiating the models before distributing it, or even fetching all the ids ahead of time. So it can work on infinitely large data sets.
Ensuring Code Quality in Multi-threaded Applications
Usage Landscape: Enterprise Open Source Data Integration
Agile Development: A Manager's Roadmap for Success
Effective Management of Static Analysis Vulnerabilities and Defects
Any comment about performance? this is usually a problem in Ruby applications
If my understanding is not completely wrong, this sounds like an Actors implementation.
Also so following comment is a bit confusing to me:
Also there is some question as to how well starfish actually distributes tasks since Ruby actually can't marshal and send code blocks over the wire, only references to it.
I don't know how Google implementation looks like, but I am having hard times understanding how C++ would distribute code blocks and execute these on various machines.
./alex
--
.w( the_mindstorm )p.
The complain about performance is leveled against almost all dynamic languages, including Java. In almost all cases, they have a point that these languages are slower than C, but miss the point that the performance tradeoff is made consciously for two reasons. The first being the belief that engineering resources are more valuable than computer resources. Obviously this argument has a limit, which brings us to the second argument. Ruby is being used in plenty of large scale production environments where performance is important. Ruby is not likely slower than any other interpreted language be it Java or Perl. A map/reduce framework written in Ruby will be slower than one written in C, but not necessarily slower than one written in Java.
Good point and to be honest I'm not sure how they do this internally at Google. I guess there's just an assumption made that you should be able to pass code blocks in a dynamic language... though this is a poor assumption. I actually haven given up implementing this in Skynet, though I still question how much utility it has. In a dynamic language like Ruby, you tend to write very little code relying on a great deal of code. How much code would you really want to send with each data slice? How much code would that code rely on? How would you know whether that code is on the worker machines? Given all of this ambiguity, it seems the idea of passing code has limited real world value. That said, having a system that is self upgrading is very useful. Skynet is self-upgrading in a rudimentary way, but we have big ideas for how it might upgrade more intelligently in the future.
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.
This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.
This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.
This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.
After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.
IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.
Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.
4 comments
Watch Thread Reply