Is the Relational Database Not an Option in Cloud Computing?
Recent focus on Cloud Computing has increased the use of key/value databases. The most common theme for this is scalability. Though scalability is a key factor, cloud computing has other advantages that makes it attractive for vendors that do not need to deliver highly scalable applications or services.
One can imagine usage scenarios where :
- Startup companies don’t want to invest in local servers and rather pay for the computing and storage they use.
- Companies want to port existing applications or services to the cloud, without re-architecting their data layer.
- High computing power is needed for shorter periods.
Tony Bain recently published an article asking if the relational database is doomed. He focuses on the difference between the relational database and the key/value database, and the reasons for selecting one over the other. According to Tony the relational database have some challenges when it comes to scalability:
As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware? The second scenario can be too difficult to manage with a relational database in general.
He lists four reasons for selecting a key/value database over a relational database:
- Your data is heavily document-oriented.
- Your development environment is heavily object-oriented.
- The data store is cheap and integrates easily with your vendor's web services platform.
- Your foremost concern is on-demand, high-end scalability.
…Microsoft has announced plans to evolve SQL Data Services into a more relational technology. Recall that unlike Windows Azure storage, SQL Data Services is built on SQL Server, which makes this evolution more natural. Yet whatever model it provides, the technology’s goal remains the same: providing a scalable, reliable, and low-cost cloud database for all kinds of applications.
Depending on which relational features, this might allow companies with needs as listed above to more easily adapt the technology and to a lower cost.
Databases like Drizzle, started by Brian Aker, aim for relational capabilities that can scale. Drizzle is based on the MySQL 6.0 source and optimized for cloud and net applications. For now they have removed a lot of functionality from the original source and added some new features:
- micro kernel architecture, making Drizzle more modular than MySQL
- more pluggable interfaces, such as for authenticiation and for logging
- multi-core optimization (compared to MySQL's potentially lacking multi-core optimization)
- fewer data types
- fewer engines
- less code making for a smaller and potentially more maintainable codebase
Drizzle is in Alpha, developed on Unix-like operating systems and Windows is currently not supported.
1) Non partitionable workloads
These definitely exist and the best data store for these will be big SMP or mainframe based databases.
2) Partitionable workloads
These are applications are highly scalability and typically use a cleanly partitionable data model. These applications are usually either transactional or analytical or both at the same time. There are options here.
Option a) If the system doesn't have to scale to a high level then a traditional application server on top of a database might provide whats needed and not require developers who get how to build highly scalable applications.
Option b) We need a lot of scaling but even if every potential customer signed up then it's still a limited system in terms of possible required throughput. This can be met by a single database box if this limit is relatively low. If its more then it can be fronted by a WebSphere eXtreme Scale type product which acts as the data access service for servicing reads and typically would do write behind for writes. This allows a system to be built which scales very well and still uses a database behind it. Clearly, eventually the database will saturate but given the power of single boxes then it may be that for the load levels they expect, this will satisfy their requirements.
Option c) A product that uses a scale out database on commodity boxes like HBase, SimpleDB, CouchDB etc. Typically the workload is then programmed using a map/reduce/cascading/jaql programming model. This might be a hybrid system that uses a tiered strategy of HBase or similar for tens/hundreds of peta bytes of data and Jaql/Cascading flows for doing large scale analytics but also uses something like WXS holding a few tera bytes of data which simultaneously processed transactions for a live snapshot of whats doing on right now and also federates with the results from the large scale jobs running on HBase etc. These systems may also federate data from conventional databases, maybe they preload them into HBase/HDFS before running the jobs or maybe they use a WXS style grid to front end the database in a similar style to option B.
I think the database will be around for a while even in the cloud. Clouds don't mean a million servers. Clouds are hosted scalable services and not all services need to scale to a million servers. The future is pretty exciting and a lot of technologies are going to be impacted by this kind of thinking over the next couple of years. It's a cool time to be building middleware.
One more database type
Re: One more database type
Re: One more database type