Cloud Foundry: Design and Architecture
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Michael Hunger on Apr 21, 2011
The US-EAST Region of Amazon's Elastic Compute Cloud experienced heavy outages today. A lot of high-profile sites were down or at least affected - Reddit, Foursquare, Quora, Hootsuite, Heroku, Assembla and Codespaces among them. The reason for the outtage are failing EBS (Elastic Block Storage) volumes - which also power the Relational Database Services - in multiple Availability Zones of the US-EAST data-center in Virginia. It is probable that resilience and recovery schemes that came into effect after network problems overloaded the EBS controllers.
8:54 AM PDT: A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it’s difficult to create new EBS volumes and EBS backed instances. -- from Amazon AWS Dashboard
News sites like eWeek, InformationWeek and CNN picked up the issue quickly. GigaOm discussed the situation for the equally vulnerable PaaS providers (Heroku, EngineYard and DotCloud) that leverage EC2.
Today, April 21 at 1:41 AM PDT Amazons AWS status page reported: "We are currently investigating latency and error rates with EBS volumes and connectivity issues reaching EC2 instances in the US-EAST-1 region.". This issue has not been fully resolved until now (1:48 PM PDT).
Besides the really good timing with the announced SkyNet attack from the Terminator Movies that was scheduled April 21, 2011 and helpful hints to Amazon engineers on twitter, there have been some thorough responses on the unexpected outage.
@scottmcnealyI said the Network is the Computer, I did not say it had 100% uptime.
@torrenegra: Today is Terminator's Judgment Day (4/21/2011). Skynet was supposed to kill us all. Fortunately for us Skynet runs on Amazon EC2.
@Nicolethebear: Dear Amazon EC2 - have you tried turning it on & off again?
Usually the different Availability Zones with in one EC2 Region are not affected by each other as they are physically separated data-centers with optimized connections to ensure low latency. So architecting systems to span multiple AZ should provide enough risk management to compensate the outage of one or more of those zones. So the availability guarantees of those zones were questioned by several sources. PCWorld discusses that with Gartner analyst Drue Reeves and Reuven Cohen, founder and CTO of Enomaly. Competing cloud provider DotCloud which also relies on Amazon EC2 reports their experience with the failure and points out technical issues with disaster recovery.
Netflix engineers are quoted in a Hacker News thread with having few issues with this problem by spanning multiple Availability Zones ("Netflix is deployed in three zones, sized to lose one and keep going. Cheaper than cost of being down.")
Keith from backdrift.org gives some simple and effective advice on how to cope with such downtimes. For instance using configuration management systems for image setup and updates (e.g. puppet), synchronizing your cloud based data and securing your DNS configuration. A post by Clay Loveless details that further.
For getting early status updates about AWS issues, following @ylastic was recommended by Eric Hammond (Alestic) who describes how to get affected servers back online.
In the aftermath of todays event there will be many questions about the reliability of cloud based applications, the necessary architectural precautions and risk management to be answered. Not just by Amazon but by other cloud providers like VMware's cloudfoundry or Google App Engine. Another topic will be SLA's given by cloud providers - Amazon’s SLA for EC2 is 99.95% for multi-AZ deployments for external connectivity. Neither EBS nor RDS have SLAs.
RDBMS to NoSQL: Managing the Transition
Why NoSQL? A Primer on the Rise of NoSQL
App Server Evolution: REST, Cloud, and DevOps Support in Resin 4
Want to know how software releases can be stress-free and happen with one click? Try Go free!
Improving Software Delivery Cycles: Pre-requisites and Inhibitors
Go: Agile Release Management Solutions. Go enables predictable, defect-free and timely software releases.
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
Andrew Watson talks about the work of the OMG, where CORBA is alive and well (hint: in your car), UML and UML Profiles vs. custom Modeling languages, DDS and other middleware, and much more.
Sohil Shah discusses creating iPhone and Android enterprise mobile applications based on cloud services using the open source platform OpenMobster.
Paul Sanford presents the transformations supported by data throughout its life cycle, and how that can be better done with Splunk, an engine for monitoring and analyzing machine-generated data.
A common “best practice” for unit tests is to only write a one assertion in each test. I intend to question this advice by showing that multiple assertions per test are both necessary and beneficial.
John Rauser presents the architectural and technological evolution of Amazon retail websites starting with 1994 and ending with adopting Amazon Web Services.
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
No comments
Watch Thread Reply