InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Is the Relational Database Not an Option in Cloud Computing?

Posted by Jon Arild Tørresdal on Feb 20, 2009

Sections
Architecture & Design,
Operations & Infrastructure
Topics
Architecture ,
Cloud Computing
Tags
Azure ,
Drizzle ,
Database

Recent focus on Cloud Computing has increased the use of key/value databases. The most common theme for this is scalability. Though scalability is a key factor, cloud computing has other advantages that makes it attractive for vendors that do not need to deliver highly scalable applications or services.

One can imagine usage scenarios where :

  • Startup companies don’t want to invest in local servers and rather pay for the computing and storage they use.
  • Companies want to port existing applications or services to the cloud, without re-architecting their data layer.
  • High computing power is needed for shorter periods.

Tony Bain recently published an article asking if the relational database is doomed. He focuses on the difference between the relational database and the key/value database, and the reasons for selecting one over the other. According to Tony the relational database have some challenges when it comes to scalability:

As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware? The second scenario can be too difficult to manage with a relational database in general.

He lists four reasons for selecting a key/value database over a relational database:

  1. Your data is heavily document-oriented.
  2. Your development environment is heavily object-oriented.
  3. The data store is cheap and integrates easily with your vendor's web services platform.
  4. Your foremost concern is on-demand, high-end scalability.

David Chappell wrote a paper about Azure Services Platform where he briefly talks about this topic. David points out many reasons for using a key/value database in the cloud, but also says:

…Microsoft has announced plans to evolve SQL Data Services into a more relational technology. Recall that unlike Windows Azure storage, SQL Data Services is built on SQL Server, which makes this evolution more natural. Yet whatever model it provides, the technology’s goal remains the same: providing a scalable, reliable, and low-cost cloud database for all kinds of applications.

Depending on which relational features, this might allow companies with needs as listed above to more easily adapt the technology and to a lower cost.

Databases like Drizzle, started by Brian Aker, aim for relational capabilities that can scale. Drizzle is based on the MySQL 6.0 source and optimized for cloud and net applications. For now they have removed a lot of functionality from the original source and added some new features:

  • micro kernel architecture, making Drizzle more modular than MySQL
  • more pluggable interfaces, such as for authenticiation and for logging
  • multi-core optimization (compared to MySQL's potentially lacking multi-core optimization)
  • fewer data types
  • fewer engines
  • less code making for a smaller and potentially more maintainable codebase

Drizzle is in Alpha, developed on Unix-like operating systems and Windows is currently not supported.

It depends by Billy Newport Posted
One more database type by Kurt Cagle Posted
Re: One more database type by H Runser Posted
Re: One more database type by Rob Tweed Posted
  1. Back to top

    It depends

    by Billy Newport

    I think the market will split three ways.

    1) Non partitionable workloads
    These definitely exist and the best data store for these will be big SMP or mainframe based databases.

    2) Partitionable workloads
    These are applications are highly scalability and typically use a cleanly partitionable data model. These applications are usually either transactional or analytical or both at the same time. There are options here.

    Option a) If the system doesn't have to scale to a high level then a traditional application server on top of a database might provide whats needed and not require developers who get how to build highly scalable applications.

    Option b) We need a lot of scaling but even if every potential customer signed up then it's still a limited system in terms of possible required throughput. This can be met by a single database box if this limit is relatively low. If its more then it can be fronted by a WebSphere eXtreme Scale type product which acts as the data access service for servicing reads and typically would do write behind for writes. This allows a system to be built which scales very well and still uses a database behind it. Clearly, eventually the database will saturate but given the power of single boxes then it may be that for the load levels they expect, this will satisfy their requirements.

    Option c) A product that uses a scale out database on commodity boxes like HBase, SimpleDB, CouchDB etc. Typically the workload is then programmed using a map/reduce/cascading/jaql programming model. This might be a hybrid system that uses a tiered strategy of HBase or similar for tens/hundreds of peta bytes of data and Jaql/Cascading flows for doing large scale analytics but also uses something like WXS holding a few tera bytes of data which simultaneously processed transactions for a live snapshot of whats doing on right now and also federates with the results from the large scale jobs running on HBase etc. These systems may also federate data from conventional databases, maybe they preload them into HBase/HDFS before running the jobs or maybe they use a WXS style grid to front end the database in a similar style to option B.

    I think the database will be around for a while even in the cloud. Clouds don't mean a million servers. Clouds are hosted scalable services and not all services need to scale to a million servers. The future is pretty exciting and a lot of technologies are going to be impacted by this kind of thinking over the next couple of years. It's a cool time to be building middleware.

    Cheers

  2. Back to top

    One more database type

    by Kurt Cagle

    I think you're also going to see the rise of XML databases, perhaps as a subclass of the name/value type of DB (most use an indexing scheme that actually maps fairly closely to n/v internally). This holds for semi-structured document-oriented data with a great deal of internal cohesiveness and "folding". These have the potential to power RESTful services in particular.

  3. Back to top

    Re: One more database type

    by H Runser

    Responding to Mr. Cagle's post: have you seen IBM's pureXML offering?

  4. Back to top

    Re: One more database type

    by Rob Tweed

    We'd agree, and have just launched such a technology. Read all about M/DB:X, an Open Source, lightweight, REST-interfaced XML Database, designed for cloud usage, at www.mgateway.com/mdbx.html

Educational Content

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.

Max Protect: Scalability and Caching at ESPN.com

Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.

The Seven Deadly Sins of Enterprise Agile Adoption

Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.

Questions for an Enterprise Architect

Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?