Cloud Foundry: Design and Architecture
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Ron Bodkin on Jul 13, 2010
The Hadoop Summit of 2010 included a keynote from Peter Sirota, General Manager of Amazon Elastic MapReduce (EMR), which is a hosted Hadoop offering from Amazon that includes web-based management tools. Sirota outlined the following use cases as common ones for their customers:
Sirota noted that customers can store 100's of PBs on Amazon's S3 storage system. He announced that Amazon are now supporting a new stack based on Hadoop 0.20 as well as one based on Hadoop 0.18 that they "won't retire any time soon." The Amazon EMR software is integrated with their management console, and works natively with Amazon's S3 cloud storage facility.
| New Stack |
Old Stack |
| Hadoop 0.20 | Hadoop 0.18 |
| Pig 0.6 | Pig 0.3 |
| Hive 0.5 | Hive 0.4 |
| Cascading 1.1 | Cascading 1.1 |
Sirota noted that customers had asked for more flexibility in running clusters, better application development tools, improved analytics and improved support options. He then announced new capabilities and partnerships in each area. Sirota announced that they are allowing customers to add and remove nodes to running clusters, which can adjust the runtime of jobs already underway - doubling the computing capacity of a job that is expected to take 6 more hours to complete could cut the time required to finish to 3 hours. He also noted that this will allow customers to conveniently change the sizes of clusters, allowing for a smaller set of nodes to answer queries using Hive and to ramp up clusters for larger batch processes that update a Hadoop system, all while keeping the same EMR cluster up.
Sirota also preannounced the coming availability of spot pricing for elastic mapreduce, extending Amazon's market pricing for excess EC2 capacity to EMR. This will allow bidding a certain amount for additional nodes. The nodes will be added to the EMR cluster if there is capacity available at the price that was bid, although they could be removed if the market price rises above the bid price. He gave the example of having a job use four on demand nodes, with five additional spot nodes being added. This option can provide cost savings for environments where there is more flexibility for how quickly calculations complete.
Sirota also announced the availability of new silver and gold premium support levels for EMR, where gold support is 7x24 and guarantees a 1 hour response time for urgent issues. Sirota then demonstrated Amazon's partnerships with Karmasphere for developer tools and monitoring Datameer for business user analytics, and Microstrategy which is providing Hadoop support in general, including EMR support, providing integration with their business intelligence tools through Hive.
Amazon hosted an Elastic MapReduce customer panel at the Hadoop Summit, which featured case studies from Razorfish, Netflix, Spiral Genetic, and Coldlight Solutions summarized by James Hamilton.
Amazon demonstrated significant continued investment in improving Elastic MapReduce and gave some interesting insights into the kinds of large scale applications that are being made with the hosted offering.
Ron Bodkin is the Founder of Think Big Analytics, which builds big data solutions using Hadoop and NoSQL.
RDBMS to NoSQL: Managing the Transition
Why NoSQL? A Primer on the Rise of NoSQL
App Server Evolution: REST, Cloud, and DevOps Support in Resin 4
Want to know how software releases can be stress-free and happen with one click? Try Go free!
Improving Software Delivery Cycles: Pre-requisites and Inhibitors
Go: Agile Release Management Solutions. Go enables predictable, defect-free and timely software releases.
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
Andrew Watson talks about the work of the OMG, where CORBA is alive and well (hint: in your car), UML and UML Profiles vs. custom Modeling languages, DDS and other middleware, and much more.
Sohil Shah discusses creating iPhone and Android enterprise mobile applications based on cloud services using the open source platform OpenMobster.
Paul Sanford presents the transformations supported by data throughout its life cycle, and how that can be better done with Splunk, an engine for monitoring and analyzing machine-generated data.
A common “best practice” for unit tests is to only write a one assertion in each test. I intend to question this advice by showing that multiple assertions per test are both necessary and beneficial.
John Rauser presents the architectural and technological evolution of Amazon retail websites starting with 1994 and ending with adopting Amazon Web Services.
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
No comments
Watch Thread Reply