BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

Amazon Web Services Stability and the September 13th US East 1 Outage

| by Chris Swan Follow 4 Followers on Sep 20, 2013. Estimated reading time: 1 minute |

Amazon Web Services (AWS) suffered another outage of its US East 1 region during the morning of Friday 13th September. A number of popular applications such as Heroku, Github and CMSWire were disrupted along with many other customers in Amazon's largest, oldest and busiest location.

A few days before this most recent failure, cloud commentator Ben Kepes wrote, 'Every time AWS has an outage it seems to be the Eastern zone that brings the service down.' Kepes goes on to refer to a post from analyst René Büst that describes US East 1 as 'old, cheap and fragile'.

Amazon hasn't released a detailed post mortem, but the problems last Friday are attributed to networking issues. A previous outage in April 2011 was also network related, though more recent issues in December 2012 and October 2012 were traced back to problems with services such as Elastic Load Balancer (ELB) and Elastic Block Storage (EBS). Network and EBS failures have been particularly pernicious as they have caused disruption across availability zones (that are supposed to be fault boundaries) or brought down higher level services (like ELB) that are supposed to provide fault tolerance.

Typically application owners have used traditional architectures rather than designing for cloud and its inherent instability, with many applications failing to use multiple availability zones in a region, or multiple regions. Design for failure doesn't always save the day however. Netflix and its 'simian armychaos monkeys' is often paraded as a paragon of cloud ready design. They deliberately cause faults in their platform on a continuous basis to prove that it can keep working, but sometimes (such as the Christmas Eve outage) there just isn't enough capacity to absorb load elsewhere, and some customers are left with a degraded service.

The succession of outages in US East 1, and the failure of services that are supposed to help (like ELB) provides an opportunity for Amazon's competitors in the infrastructure as a service market. Google has recently released its own load balancing service for Google Compute Engine along with recommendations for designing robust systems

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT