Cloud Foundry: Design and Architecture
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Jean-Jacques Dubray on Apr 25, 2011
While many high profile sites complained to be impacted by AWS issues, Twilio's APIs and services were not affected even though they heavily rely on AWS for growing and scaling their cloud telephony platform. For Evan Cooke, Co-founder and CTO of Twilio, this shows both the amazing success of cloud services in enabling the current Internet ecosystem, and also the importance of solid distributed architectural design when building cloud services.
As we’ve grown and scaled Twilio on Amazon Web Services, we’ve followed a set of architectural design principles to minimize the impact of occasional, but inevitable issues in underlying infrastructure.
By building simple services composed of a single host, rather then multiple dependent hosts, one can create replicated service instances that can survive host failures.
When failures happen, have software quickly identify those failures and retry requests. By running multiple redundant copies of each service, one can use quick timeouts and retries to route around failed or unreachable services.
If the API of a dependent service is idempotent, that means it is safe to retry failed requests.
Separate business logic into small stateless services that can be organized in simple homogeneous pools.
When strict consistency is not required, create pools of replicated and redundant read data.
In the light of the details of the outage, Evan also explained that Twilio uses EBS only for non-critical and non-latency sensitive tasks because it doesn’t satisfy the “unit-of-failure is a single host principle.” If EBS were to experience a problem, all dependent service could also experience failures. Instead, they have focused on utilizing the ephemeral disks present on each EC2 host for persistence. If an ephemeral disk fails, that failure is scoped to that host. Evan will publish a follow-on post describing how they are doing RAID0 striping across ephemeral disks to improve I/O performance.
This is in line with the principles and approach that SmugMug took, who also elected not to use EBS, as explained by Don McAskill.
Mike Kavis, CTO of M-Dot Network , explained that Amazon's IaaS has become a PaaS:
Amazon has numerous services that a developer can call that can take time consuming and human resource intensive tasks, and simplify and automate them in a simple call. Cloudwatch (monitoring and autoscaling) and http://aws.amazon.com/rds/aws.amazon.com/rds/ (database administration) are just two of many services that come to mind. Once you start using these services you are essentially in a PaaS scenario where you are leveraging services that are proprietary to the vendor’s stack.
For him, this kind of dependencies and possible outages have to be factored in your architecture and business model as building a cloud-provider agnostic architecture is rarely practical without rebuilding these services yourself.
Clearly, a Disaster Recovery plan is not optional even in the Cloud, and Architecture is and will remain essential for building Cloud-based solutions, this is not new. Are Twilio's principles enough? How do you see Cloud Architecture evolving from here? more redundancies? home grown services? more architecture principles? How will this translate to PaaS-based solutions?
RDBMS to NoSQL: Managing the Transition
Why NoSQL? A Primer on the Rise of NoSQL
App Server Evolution: REST, Cloud, and DevOps Support in Resin 4
Want to know how software releases can be stress-free and happen with one click? Try Go free!
Improving Software Delivery Cycles: Pre-requisites and Inhibitors
Go: Agile Release Management Solutions. Go enables predictable, defect-free and timely software releases.
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
Andrew Watson talks about the work of the OMG, where CORBA is alive and well (hint: in your car), UML and UML Profiles vs. custom Modeling languages, DDS and other middleware, and much more.
Sohil Shah discusses creating iPhone and Android enterprise mobile applications based on cloud services using the open source platform OpenMobster.
Paul Sanford presents the transformations supported by data throughout its life cycle, and how that can be better done with Splunk, an engine for monitoring and analyzing machine-generated data.
A common “best practice” for unit tests is to only write a one assertion in each test. I intend to question this advice by showing that multiple assertions per test are both necessary and beneficial.
John Rauser presents the architectural and technological evolution of Amazon retail websites starting with 1994 and ending with adopting Amazon Web Services.
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
No comments
Watch Thread Reply