BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Amazon's SimpleDB and IBM's Blue Cloud Continue the Rise of Cloud Computing

Amazon's SimpleDB and IBM's Blue Cloud Continue the Rise of Cloud Computing

This item in japanese

The cloud computing segment of the software industry has been quite busy as of late. Earlier this week Amazon introduced a beta of SimpleDB, a web service for running queries on structured data in real time. SimpleDB compliments Amazon's other computing services Amazon S3 and Amazon EC2. The announcement comes roughly a month after IBM announced their Blue Cloud initiative. Both may play a large role in shifting some software applications from using private infrastructure to using utility like computing resources. Beyond availability, education of beginning and established software engineers is also powering the trend.

In October IBM and Google announced a university initiative to address internet scale computing challenges:

...The goal of this initiative is to improve computer science students’ knowledge of highly parallel computing practices to better address the emerging paradigm of large-scale distributed computing. IBM and Google are teaming up to provide hardware, software and services to augment university curricula and expand research horizons. With their combined resources, the companies hope to lower the financial and logistical barriers for the academic community to explore this emerging model of computing...

To simplify the development of massively parallel programs Google and IBM have created the following resources:

  • A cluster of processors running an open source implementation of Google’s published computing infrastructure (MapReduce and GFS from Apache’s Hadoop project)
  • A Creative Commons licensed university curriculum developed by Google and the University of Washington focusing on massively parallel computing techniques available at: http://code.google.com/edu/content/parallel.html
  • Open source software designed by IBM to help students develop programs for clusters running Hadoop. The software works with Eclipse, an open source development platform. The plugin is currently available at: http://lucene.apache.org/hadoop/
  • Management, monitoring and dynamic resource provisioning of the cluster by IBM using IBM Tivoli systems management software
  • A website to encourage collaboration among universities in the program. This will be built on Web 2.0 technologies from IBM’s Innovation Factory.

Not to be outdone, Yahoo followed in November announcing an open source program aimed at advancing the research and development of systems software for distributed computing. A key part of the announcement was that Yahoo would make available a Hadoop enabled super computing data center named M45. the cluster has "approximately 4,000 processors, three terabytes of memory, 1.5 petabytes of disks, and a peak performance of more than 27 trillion calculations per second (27 teraflops), placing it among the top 50 fastest supercomputers in the world". Also in December the Yahoo Developer Network debuted a new blog centered on Hadoop and Distributed Computing.

Day's afterward, IBM announced their Blue Cloud computing program targeted for the first quarter of 2008:

...Blue Cloud – based on IBM’s Almaden Research Center cloud infrastructure -- will include Xen and PowerVM virtualized Linux operating system images and Hadoop parallel workload scheduling. Blue Cloud is supported by IBM Tivoli software that manages servers to ensure optimal performance based on demand. This includes software that is capable of instantly provisioning resources across multiple servers to provide users with a seamless experience that speeds performance and ensures reliability even under the most demanding situations. Tivoli monitoring checks the health of the provisioned servers and makes sure they meet service level agreements...

The first offerings will be an IBM BladeCenter equipped with a suite of "cloud" software. While initially meant to be ran internally, IBM will explore cloud computing services as the suite evolves.

Finally, Amazon introduced a limited beta of their SimpleDB service this week. SimpleDB is not a relational database. Instead it is based on a hash table like model. CRUD operations and a query language are provided. Pricing is similar to other Amazon cloud services in that it is based on storage amounts and machine time consumed:

Machine Utilization - $0.14 per Amazon SimpleDB Machine Hour consumed

Amazon SimpleDB measures the machine utilization of each request and charges based on the amount of machine capacity used to complete the particular request (QUERY, GET, PUT, etc.), normalized to the hourly capacity of a circa 2007 1.7 GHz Xeon processor.

Data Transfer

  • $0.10 per GB - all data transfer in
  • $0.18 per GB - first 10 TB / month data transfer out
  • $0.16 per GB - next 40 TB / month data transfer out
  • $0.13 per GB - data transfer out / month over 50 TB

Data transfer "in" and "out" refers to transfer into and out of Amazon SimpleDB. Data transferred between Amazon SimpleDB and other Amazon Web Services is free of charge (i.e., $0.00 per GB).

Structured Data Storage - $1.50 per GB-month

Amazon has also detailed the differences in infrastructure between Amazon S3 and Amazon SimpleDB:

...Unlike Amazon S3, Amazon SimpleDB is not storing raw data. Rather, it takes your data as input and expands it to create indices across multiple dimensions, which enables you to quickly query that data. Additionally, Amazon S3 and Amazon SimpleDB use different types of physical storage. Amazon S3 uses dense storage drives that are optimized for storing larger objects inexpensively. Amazon SimpleDB stores smaller bits of data and uses less dense drives that are optimized for data access speed.

In order to optimize your costs across AWS services, large objects or files should be stored in Amazon S3, while smaller data elements or file pointers (possibly to Amazon S3 objects) are best saved in Amazon SimpleDB. Because of the close integration between services and the free data transfer within the AWS environment, developers can easily take advantage of both the speed and querying capabilities of Amazon SimpleDB as well as the low cost of storing data in Amazon S3, by integrating both services into their applications...

Charles H. Ying reports that SimpleDB is based on top of Erlang. He goes on to note the following considerations:

  • Eventual Consistency - Data is not immediately propagated across all nodes… the latency is usually around a second, but for high data sets or loads, you may experience more latency. On the plus side, your data isn’t lost!
  • Queries are lexigraphical - You’ll need to store data in lexicographical ordered form (zero-pad your integers, add positive offsets to negative integer sets, and convert dates into something like ISO 8601)
  • Search Indexes - You’ll need to construct your own indexes for text search - The SimpleDB query expressions don’t support text search, so you’ll have to construct inverted indexes to properly do “text search”. This is actually a really great lightweight way to do this and I’m sure many interesting indexing schemes will be possible.

GigaOM's Nitin Borwankar finds SimpleDB has highly disruptive and provides a brief comparison in respect to existing relational databases and the Object Relational Mappings commonly used by ActiveRecord and Hibernate.

Rate this Article

Adoption
Style

BT