MySQL Cluster 7.2 Released with 70x Increased Performance and NoSQL Features
Oracle announced a new release of MySQL Cluster 7.2 on February 15th 2012. This release allows SQL and NoSQL access. The NoSQL access is allowed through a C++ API, a memcached wire protocol and a REST API. The SQL interface is done through MySQL normal access points (JDBC, etc.). Oracle is claiming gains of up to 70x for complex queries. In addition to the release of MySQL Cluster 7.2 which is open source, Oracle released MySQL Cluster Manager which is commercial, version 1.1.4, to simplify the management of MySQL Cluster deployments.
This MySQL Cluster release added improvements in autosharding the data and replication conflict resolution in addition to some scalability and reliability improvements that allow replication at different data centers. There was also some work in fixing issues with replicating user privileges and making the clusters easier to manage and administer. As well as some work done to certify and integrate the cluster nodes into the Oracle VM infrastructure to take advantage of increased elastic scalability for cloud deployments. MySQL Cluster 7.2 in now certified to run on Oracle Linux and Oracle Solaris.
"MySQL Cluster 7.2 demonstrates Oracle's investment in further strengthening MySQL's position as the leading Web database" said Tomas Ulin, vice president of MySQL Engineering, Oracle in the press release. He went on to say "The performance and flexibility enhancements in MySQL Cluster 7.2 provide users with a solid foundation for their mission-critical Web workloads, blending the best of SQL and NoSQL technologies to reduce risk, cost and complexity."
InfoQ caught up with Tomas Ulin earlier in the week to discuss this release.
Tomas discussed MySQL and its new NoSQL layer. The cluster is hashed and tables are sharded across nodes automatically (hashing on the primary key). MySQL cluster boasts 99.999 availability and has a distributed share nothing philosophy. Data is replicated on nodes to facilitate self healing. This also allows you to upgrade your cluster while it is online. Upgrades and maintenace are included in the 99.999 percent availability because the replication allows you to take boxes out of rotation for software and hardware upgrades.
MySQL AB acquired MySQL cluster software from a Telecom company (Alzato, a small venture company started by Ericsson). MySQL Cluster has been used in Telecom for a while even in-memory on dedicated devices. Tomas explained that about 1 billion people in the world are being connected to mobile phone calls with MySQL Cluster every day ("if you use a cell phone the likelihood is that you have used MySQL cluster before").
Tomas went on to explain that this MySQL cluster release is in a series of MySQL releases form Oracle and since Oracle has been at the helm that MySQL progress in engineering and new releases exceeds anything done at Sun or MySQL AB at a "fast and furious" pace. Oracle commitment to MySQL is shown by the 2010 5.5 release, the 2011 5.6 release, the 2011 release of MySQL cluster, and this release. Oracle is committing resources to MySQL across its stack including new MySQL windows installers, Oracle Fusion support, and more.
Tomas explained key benefits of MySQL cluster include auto sharding, online scaling (adding nodes, redistributed data over time to avoid burst, getting rid of old data after redistributed) and the ability to scale on SQL layer or data layer (NoSQL layer).
Oracle provides different levels of support and licenses for MySQL including MySQL Standard edition, MySQL Enterprise edition (partitioning, backup, scalability, etc.), and the highest tier MySQL Cluster Carrier Grade (which includes MySQL clustering). You can also get MySQL Clustering 7.2 via the GPL license.
Regarding the 70x improvement in complex query speed ("mind blowing numbers"), Tomas mentioned that this was accomplished with what they call "push down joins" or "distributed joins" which sound a lot like a MapReduce specific use case to do a distributed join on a heavily sharded relational database. The joins happen to the actual data nodes. They are done in parallel on all of the nodes, and then one node combines them. The parallels of the join is where the speed comes in. Tomas claims billion queries per minute with an 8 node cluster and near linear scalability in join processing by adding new nodes.
If you are like the author of this article, you are likely wondering just how do you map a relational SQL database so that it is accessed via Memcached wire protocol which is a key value store where the value is a schema-less, typeless blob. It is not 100% transparent and requires a bit of work. In some cases you have to insert meta-data into tables that the MySQL Memcached driver uses to map database row/columns to Memcached access. Oracle provided this link that explains the mapping to Memcached from MySQL cluster.
NoSQL and Joins
It will be interesting to see if a 70x Join improvement changes people's minds.
For you lazy people out there who want to deploy a 4-node cluster in under 5 minutes, we just added 7.2 GA to the Cluster Configurator.