InfoQ

News

Amazon Brings Virtualized Storage to the Cloud with Elastic Block Storage

Posted by Scott Delap on Aug 21, 2008 12:04 PM

Community
Architecture
Topics
Cloud Computing ,
Virtualization
Tags
EC2
In April of this year Amazon CTO Werner Vogels announced the development of persistent storage for Amazon EC2. This has long been an achilles heel of the EC2 platform. Server instances startup with the contents of their parent image. Upon server failure/restart the disk reverts back to the original image definition. Today Amazon moved to address this issue with the release of Elastic Block Storage (EBS). Vogels outlines how the offering completes Amazon's suite of common storage patterns:
We had to make sure that the infrastructure storage solutions we were going to develop would be highly effective for developers by addressing the most common patterns first. That analysis led us to three top patterns:
  • Key-Value storage. The majority of the Amazon storage patterns were based on primary key access leading to single value or object. This pattern led to the development of Amazon S3.
  • Simple Structured Data storage. A second large category of storage patterns were satisfied by access to simple query interface into structured datasets. Fast indexing allows high-speed lookups over large dataset. This pattern led to the development of Amazon SimpleDB. A common pattern we see is that secondary keys to objects stored in Amazon S3 are stored in SimpleDB, where lookups result in sets of S3 (primary) keys.
  • Block storage. The remaining bucket holds a variety of storage patterns ranging special file systems such as ZFS to applications managing their own block storage (e.g. cache servers) to relational databases. This category is served by Amazon EBS which provides the fundamental building block for implementing a variety of storage patterns.

Amazon has also provided details in regards to pricing, durability, and performance. Highlights include:

  • Volumes can be between 1GB and 1TB in size.
  • Volumes behave like raw unformatted block devices.
  • Access is limited to within the same availability zone similar to a SAN in a data center.
  • A volume can only be attached to one EC2 instance at a time.
  • One EC2 instance can have several attached volumes.
  • Volumes can have snapshots backed up to S3. Snapshots are incremental with only changed data.
  • Due to data replication, complete volume failure is expected to be 0.1% - 0.5% based on volume size compared to 4% for commodity hard disks.
  • Pricing is $0.10 per allocated GB and $0.10 per million I/O requests.
Given this pricing it is estimated that a medium size database with 100GB of storage would cost $10 in storage and $26 in usage costs. A tutorial is available for running MySQL with EBS. Right Scale has written an overview providing further analysis of the specifications that includes a number of best practices and formulas for cost estimation. In regards to I/O rates they provide the following practical experience:

...As a point of reference, our main database server is pretty busy and chugs along at an average of 17 transactions per second, which should total to around $4.40 per month. But our monitoring servers, prior to some recent optimizations, hammered the disks as fast as they would go at over 1000 random writes per second sustained 24×7. That would end up costing over $250 per month! As far as I can tell, for most situations the EBS transaction costs will be in the noise, but you can make it expensive if you’re not careful...

Finally, GigaOM provides a business analysis of the new offering noting that traditional data centers should be worried.

No comments

Watch Thread Reply

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.