10 tips on how to prevent business value risk
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Dionysios G. Synodinos on Jan 14, 2010
There have been various reports from the community of Amazon EC2 users, that their instances are suffering poor performance, as the result of high internal network latency. This has led to speculations that Amazon's Cloud might be getting oversubscribed.
Alan Williamson from aw2.0 Ltd, has written a report about his experiences with Amazon EC2, where he claims that Amazon, as every Cloud provider his company has tried out, seems to scale well at the beggining but there is a tipping point:
Amazon in the early days was fantastic. Instances started up within a couple of minutes, they rarely had any problems and even their SMALL INSTANCE was strong enough to power even the moderately used MySQL database. For a good 20 months, all was well in the Amazon world, with really no need for concern or complaint.
...
However, in the last 8 or so months, the chinks in their armour have begun to show. The first signs of weakness came from the performance of the newly spun up Amazon SMALL instances. According to our monitoring, the newly spun up machines in the server farm, were under performing compared to the original ones. At first we thought these freaks-of-nature, just happened to beside a "noisy neighbor". A quick termination and a new spin up would usually, through the laws of randomness, have us in a quiet neighborhood where we could do what we needed.
...
However, in the last month of two, we've even noticed that these "High-CPU Medium Instance" have been suffering a similar fate of the Small instances, in that, new instances coming up don't seem to be performing anywhere near what they should be. After some investigation, we discovered a new problem that has crept into Amazon's world: Internal Network Latency.
Similarly cloudkick has reported high network latency for its instances:
A couple of weeks ago we noticed that our ping latency graphs on Cloudkick looked very odd.
...
...our monitoring node on EC2 is pinging four different servers on Slicehost. The average ping latency is all over the place.
...
The conclusion? Alan Williamson's post on EC2 oversubscription seems to make a lot of sense. The network behind EC2 appears to be experiencing very sporadic latency issues.
There have even been posts on AWS discussion forums from EC2 clients that have been experiencing networking issues:
We have an instance which started to become EXTREMELY unresponsive at 9:15 AM CST today. You could sometimes log into it, sometimes not. While the situation did not resolve itself, another instance was started (assuming there was a hardware problem on that instance) which has the same issue. I'm thinking there may be a network issue.
I've been able to log in once or twice, and once everything was normal for a bit and then it became unresponsive again. Any clue?
Instance IDs are i-c4921fad and i-a0e3d7c8. I am seeing the same network issues when attempting to connect to our machines from machines in other EC2 zones.
Alan reports that during an emergency he tried to cope by rapidly deploying new instances, but it didn't work for him:
In one particular "fire fighting mode", we spent an hour literally spinning up new instances and terminating them until we found ourselves on a node that actually responded to our network traffic.
In virtualized environments and specifically in the case of “Noisy Neighbors”, where you happen to be on a node where a neighboring instance is computationally heavy, this doesn't seem to be a good practise since there is a "tendency for EC2 to assign fresh instances to the same small set of machines" [PDF].
You can find more information about Cloud Computing and Amazon EC2, right here on InfoQ.
Dionysios G. Synodinos is a Web Engineer and a freelance consultant, focusing on Web technologies
Complimentary Gartner (Hype Cycle for Cloud Security Report)
A practical guide to choosing the right agile tools
Mobile and the New Two-Tiered Web Architecture
Getting Started with Stratos - an Open Source Cloud Platform
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.
John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.
Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.
Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.
Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?
Sean Cribbs explains what Map-Reduce and Riak are, why and how to use Map-Reduce with Riak, and how to convert SQL queries into their Map-Reduce equivalents.
No comments
Watch Thread Reply