SQL Makes a Comeback through NewSQL
New database developments indicate a return to SQL, but not by running the traditional relational stores on bigger and better hardware, not even on sharded architectures, but through NewSQL solutions.
After losing ground to NoSQL – initially perceived as “No more SQL”, and later as “Not only SQL”, the old SQL knows a comeback these days. One of the advertised solutions has been sharding, but for some this is not enough. New ways have to be found, some of them combining the two technologies, SQL and NoSQL, others by improving the performance and scalability capabilities of relational stores, all of these being known as NewSQL. Google, one of the first supporters of NoSQL, built F1, a distributed relational database combining the high availability and scalability of BigTable with the “consistency and usability” of SQL. Google describes F1 in the whitepaper F1: A Distributed SQL Database That Scales (PDF) as:
… a fault-tolerant globally-distributed OLTP and OLAP database built at Google as the new storage system for Google's AdWords system. It was designed to replace a sharded MySQL implementation that was not able to meet our growing scalability and reliability requirements.
One of these NewSQL solutions is MemSQL, a fully in-memory solution for real-time analytics of structured or semi-structured (JSON) data. It does not use columnar stores but “lock-free skip lists and lock-free hash tables” for faster access to data, and employs parallel processing on a shared-nothing architecture with no single point of failure.
Another NewSQL variant is ClustrixDB, a peer-to-peer shared-nothing distributed database for transaction processing and real-time analytics. According to Robin Purohit, Clustrix CEO, their database manages to process 4.4B transactions/day with an average latency of 5-10 ms on 21 nodes (8-cores/ 48GB RAM each) at Twoo.com by being
built from scratch as a peer-to-peer distributed SQL database with no single coordinator (and therefore no single point-of-failure). ClustrixDB uses distributed transactions using Paxos consensus protocol. ClustrixDB also uses distributed 2 phase locking for writes and distributed multi-version concurrency control to ensure reads and writes do not interfere. This guarantees the strict ACID properties expected from a single node database in a distributed environment.
ClustrixDB uses shared-nothing architecture - the only architecture known to scale linearly. ClustrixDB brings Massively Parallel Processing (MPP) for real-time analytics that has only been available in data warehousing, to the primary database.
We asked Toon Coppens, CTO at Twoo.com, why their initial MySQL sharded solution did not work for them, opting for a NewSQL one:
As we learned with Netlog.com which had hundreds of sharded mysql boxes, the engineering overhead of rebalancing and managing the shards, not the least the inflexibility to change queries or create new ones over all shards on the fly, made this route less favourable. We like to think as data sitting in one query-able place.
While NoSQL would provide us with scalable possibilities, we didn't want to tie us to the low level-ish data representation it often implies. We want(ed) full flexibility of changing product & feature requirements on the go, while not having to reform the data layer on a live and growing site. (clustrix provides quick alters while everything keeps running under heavy load, amongst other great features).
While NoSQL solutions have been praised for their performance, scalability and availability, the development and data refactoring efforts seem to be higher than those associated with SQL datastores. This prompts some to turn to NewSQL which combines the advantages of NoSQL with the power of SQL. What it matters in the end is to use the solution that satisfies one’s needs.
NewSQL could curb the NoSQL adoption