Cloud Foundry: Design and Architecture
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Ron Bodkin on Feb 18, 2011
Arun Murthy, the Yahoo! Hadoop Map-Reduce development team lead recently announced and presented a redesign of the core map-reduce architecture for Hadoop to allow for easier upgrades, larger clusters, fast recovery, and to support programming paradigms in addition to Map-Reduce. The redesign of the Hadoop core for map-reduce splits the engine into a resource manager that supports a variety of cluster computing paradigms, and makes map-reduce a user library, allowing an organization to run multiple versions of map-reduce code in the same cluster. The new design is quite similar to the open source Mesos cluster management project - both Yahoo! and Mesos commented on the differences and opportunities.
The main benefits of the new approach are:
The core element of this redesign is to split responsibilities into a generic cluster resource management system, and a separate application master for map-reduce, or indeed any other programming model. This will replace the Job Tracker and the Task Tracker. The resource management system consists of the following cluster-wide controllers:

Next Generation MapReduce Architecture. Source: Yahoo
Distributed across the worker nodes, there will be:
In a follow up discussion Murthy said that the system will not use Linux containers to enforce resource limits, since their tests showed they were too high overhead, but that they will use Linux cgroups to sandbox processes, enforcing container processes adhere to the limits they requested. The system will allow for locality-sensitive requests (e.g., to run tasks on a machine or a rack near a file), to maintain locality in map-reduce jobs, which is also desirable for other distributed programming paradigms. To support the map-reduce shuffle phase, the NodeManager will allow remote tasks to issue remote requests to read local data on their disk. Initially the map-reduce reduce containers will ask for splits from a single NodeManager for each map task that ran on that node, to improve efficiency. Murthy mentioned that in the future they want to add a coalescing operation that will sort together all the map task data on a single node and also possible on a single rack, to further improve the efficiency of shuffles.
Murthy noted that the technology is currently in a prototype stage of development. Yahoo! is eager to deploy it this year,to allow larger clusters to be deployed, although this is dependent on internal processes to release the code and coordinating a release for Apache Hadoop. After Arun presented on the technology at the February Bay Area Hadoop User Group meeting, Ted Dunning of MapR Technolgies asked if Mesos doesn't already do all this. Murthy responded that Mesos needs both a JobTracker and a TaskTracker to run map-reduce, whereas this implementation replaces the need for a JobTracker or a TaskTracker. We asked the Mesos team to comment.
I think it would be best to take advantage of this refactoring of MapReduce to let it support multiple resource managers -- not just for Mesos's sake but also because people naturally want to run Hadoop on things like Sun Grid Engine, LSF, Condor, etc. Resource scheduling for generic applications is a much harder problem than implementing MapReduce, and there's likely to be a lot of experimentation and different systems for this. Yahoo's refactoring will likely make it easier to run Hadoop on Mesos... In any case, we will remain committed to supporting Hadoop on our end.
...We plan on continuing to grow Mesos into a powerful system. One advantage that I think we'll have with this respect is that Mesos is continuing to be built, from the ground up, to run applications beyond just data processing. At Twitter we are running more and more general-purpose "server" like applications everyday.
In response to the question about how the proposed Hadoop architecture differs from Mesos, Zaharia said:
...one of the major differences is state management in the Mesos master. In Mesos, we have designed the system to let the applications do their own scheduling, and have the master contain only soft state about the currently active applications and running tasks. In particular, the master doesn't need to know about pending tasks that the application hasn't launched yet. ... In the Hadoop next generation design, as far as I understand, the master keeps a list of all pending tasks in ZooKeeper, which leads to more state to manage. Apart from this technical issue, Mesos has been designed to support non-MapReduce frameworks for the beginning, and we have been using it for MPI, long running services, and our in-memory parallel processing system called Spark. One last difference from a practical point of view is that Mesos can be used with Hadoop 0.20 (with a patch) instead of requiring one to wait for a new Hadoop release.
InfoQ spoke to Eric Baldeschwieler, Yahoo!'s VP of Hadoop Development to understand more about the project origins and differences with Mesos. Baldeschwieler said there are many projects that provide cluster scheduling capabilities, including Mesos. However, Yahoo! felt that none of the existing options were designed to solve the map-reduce problem as they conceive of it. He noted that some of the options were new and immature, and those that were mature were missing key features. He commented that the Next Generation Map Reduce design reflected years of experience running the largest Hadoop installation in the world, and was a natural evolution as Yahoo! works incrementally to make map-reduce better.
With respect to Mesos, Baldeschwieler said that it is good work, but that Yahoo! wanted an integrated component to Hadoop, and not a meta-layer. Also, he noted that the Next Generation MapReduce architecture has been in design for years and that the Yahoo! team has been collaborating with Mesos team members throughout their design process and is watching their work with interest. Baldeschwieler said he'd welcome contributions from the Mesos team to the Hadoop project, and would be glad to help them better integrate with Hadoop. He noted that in past Yahoo! has adopted externally developed Hadoop technologies that have proven their value by building a community, such as HBase and Hive.
It will be interesting to see the dynamic as the Hadoop ecosystem evolves to a world of resource schedulers supporting multiple programming paradigms, and addresses increased scalability, reliability, and efficiency.
Ron Bodkin is the Founder of Think Big Analytics, which builds big data solutions using Hadoop and NoSQL.
Introducing SQLFire: a memory-optimized, high performance SQL database
Introduction to WebSphere Liberty Profile
Big Data, Cloud & Mobile: Navigate the New Development Reality with Resources from IBM
VMware vFabric SQLFire - Test drive the data management system with memory speed, horizontal scalability and a familiar SQL interface
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
Andrew Watson talks about the work of the OMG, where CORBA is alive and well (hint: in your car), UML and UML Profiles vs. custom Modeling languages, DDS and other middleware, and much more.
Sohil Shah discusses creating iPhone and Android enterprise mobile applications based on cloud services using the open source platform OpenMobster.
Paul Sanford presents the transformations supported by data throughout its life cycle, and how that can be better done with Splunk, an engine for monitoring and analyzing machine-generated data.
A common “best practice” for unit tests is to only write a one assertion in each test. I intend to question this advice by showing that multiple assertions per test are both necessary and beneficial.
John Rauser presents the architectural and technological evolution of Amazon retail websites starting with 1994 and ending with adopting Amazon Web Services.
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
No comments
Watch Thread Reply