Bindings, Platforms, and Innovation
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
Tracking change and innovation in the enterprise software development community
Posted by Michael Bushe on Feb 28, 2008 09:00 AM
Amazon Web Services' Simple Storage Service (S3), a cloud-based storage platform used by many popular websites including Twitter, G.ho.st, and 37signals' Basecamp, suffered a major outage last week. The outage occurred in one of S3's three geographical sites and lasted a little over two hours.The s3 service is great but this just proves you can't rely on it, this is a major issue especially since it's been down for so long.Other users were quick to point to S3's long reliable track record:
This is the first outage I have experienced since I joined the service nearly a year ago.InfoQ interviewed a number of longtime S3 users and found a consistent story on S3's reliability. Over the past year there have been only one or two minor hiccups lasting less than two minutes.
In one of our locations we started seeing elevated levels of authenticated requests from multiple users. While we carefully monitor our overall request volumes and these remained within normal ranges, we had not been monitoring the proportion of authenticated requests. Importantly, these cryptographic requests consume more resources per call than other request types. Within a short amount of time, we began to see several other users significantly increase their volume of authenticated calls. The last of these pushed the authentication service over its maximum capacity before we could complete putting new capacity in place. In addition to processing authentication requests, the authentication service also performs account validation on every request Amazon S3 handles. This caused Amazon S3 to be unable to process any requests in that location.Meanwhile, some users were frustrated by the lack of communication during the outage. Rien Swagerman, owner of Viewbook.com, told InfoQ:
What's quite amazing is that ... Amazon is giving very little status information when something like this happens. You have to dive deep in a forum somewhere to get some info. And this forum was down for posting [during] the first hour of the outage.Amazon's spokesperson told us that Amazon.com and their developer boards were affected by the outage. Amazon eats its own dog food, which is usually a good sign, but cloud computing may be changing the calculus.
There's no other vendor yet that delivers the combination of these services for this quality and price. Actually, I'm happy that this happened ... it will challenge them to provide an even better service.Amazon is indeed going to be challenged in the burgeoning cloud computing market. Earlier this year, EMC launched EMC Fortress, a SaaS storage platform that is initially targeting backup, by leveraging their Mozy acquisition. This week, EMC announced that it hired Paul Maritz, a former Microsoft executive, to lead its new Cloud Infrastructure and Storage Division. EMC will likely be targeting a higher-end market segment than Amazon, providing more options on the price/reliability scale.
Comprehensive Threat Protection for REST, SOA, and Web 2.0 Applications
5 Ways to Ensure Application Performance
Business Benefits of Open Source SOA
SendAlong.com uses S3 heavily - very heavily. Any files that go through the system go through S3 in some form. And right now there's no caching (like on an EC2 instance for example). But the net affect of a two hour outage for our business model wasn't such a big deal (well...especially since we were not officially launched at the time). If you ask most start ups/microisvs/small businesses using S3, they may tell you the same thing: downtime sucks, but a two hour outage every year is a lot less downtime than you'd see if I was managing my own set of storage servers! Granted, Amazon does need to do a better job of communicating downtime, but it looks like they'll be doing that soon now. Whether it's cloud computing or not, developers need to assume that the resource, whatever it is, is going to suffer from downtime. If it hadn't been S3 itself, it may have been a network backbone instead, or one of a hundred things. The point is that external resources fail. Most developers know that - so if they were caught off guard that S3 could go down, they shouldn't have been. And I'd say for an external resource, S3 does a pretty good job at up time:). Jon Chase http://www.sendalong.com - Send large files to anyone
What I found bothersome about Amazon's responses was that they only describe what went on inside their walls. "We only lost one site out of three," "In a few minutes load was back to normal," and so on. The clear tip off was the difference between the reports from users and the reports for Amazon. Listening to Amazon you'd think the outage was no big deal, listening to users it was a significant problem. Openness is an important part of building trust, but empathy comes first. Convince me that you know what it's like on my end, then I'm interested in hear about what happened. The next time this happens, report brutally frankly about what it was like for users, then explain yourselves.
Kent does common sense as you are asking for it still exist in business today? Even with books like "Human SIGMA" which pretty much describes how a well run service organization should behave towards its customers, I don't think that a lot of companies will do what you asked. And the simple reason being that their legal departments won't allow it for fear of being sued. Being honest opens the door to be sued unfortunately.
Hi, I just posted some thoughts on "Cloud Availability" at http://mukulblog.blogspot.com/2008/07/cloud-availability.html . Your thoughts are welcome. Thanks, Mukul.
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.
This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.
This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.
This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.
After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.
IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.
Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.
4 comments
Watch Thread Reply