BT

InfoQ Homepage News Amazon Web Services Stability and the September 13th US East 1 Outage

Amazon Web Services Stability and the September 13th US East 1 Outage

Bookmarks

Amazon Web Services (AWS) suffered another outage of its US East 1 region during the morning of Friday 13th September. A number of popular applications such as Heroku, Github and CMSWire were disrupted along with many other customers in Amazon's largest, oldest and busiest location.

A few days before this most recent failure, cloud commentator Ben Kepes wrote, 'Every time AWS has an outage it seems to be the Eastern zone that brings the service down.' Kepes goes on to refer to a post from analyst René Büst that describes US East 1 as 'old, cheap and fragile'.

Amazon hasn't released a detailed post mortem, but the problems last Friday are attributed to networking issues. A previous outage in April 2011 was also network related, though more recent issues in December 2012 and October 2012 were traced back to problems with services such as Elastic Load Balancer (ELB) and Elastic Block Storage (EBS). Network and EBS failures have been particularly pernicious as they have caused disruption across availability zones (that are supposed to be fault boundaries) or brought down higher level services (like ELB) that are supposed to provide fault tolerance.

Typically application owners have used traditional architectures rather than designing for cloud and its inherent instability, with many applications failing to use multiple availability zones in a region, or multiple regions. Design for failure doesn't always save the day however. Netflix and its 'simian armychaos monkeys' is often paraded as a paragon of cloud ready design. They deliberately cause faults in their platform on a continuous basis to prove that it can keep working, but sometimes (such as the Christmas Eve outage) there just isn't enough capacity to absorb load elsewhere, and some customers are left with a degraded service.

The succession of outages in US East 1, and the failure of services that are supposed to help (like ELB) provides an opportunity for Amazon's competitors in the infrastructure as a service market. Google has recently released its own load balancing service for Google Compute Engine along with recommendations for designing robust systems

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.