Tokutek has announced new versions of its storage engines products. TokuDB for MySQL has reached version 7.5 and includes two significant performance features. TokuMX for MongoDB has reached version 2.0 and includes a mix of new features including performance improvements of its own.
Prior versions of TokuDB have supported bulk fetching of rows for simple selects, the motivation for which is explained by Rich Prohaska, an engineer at Tokutek, saying:
MySQL uses handler APIs to fetch one row at a time from TokuDB (and other storage engines). Unfortunately, a fractal tree search is too heavy to use for each handler call to get the next or previous row from a fractal tree. TokuDB uses a bulk fetch buffer in the handler that is filled by a single fractal tree search. When MySQL calls the next or previous TokuDB handler and the bulk fetch buffer is not empty, then a row is popped from the bulk fetch buffer and returned to MySQL. Otherwise, the bulk fetch buffer is refilled by a fractal tree search that returns multiple rows.
In TokuDB 7.5, this feature has been expanded to include more complex select statements including “INSERT [IGNORE] INTO … SELECT”, “INSERT INTO … ON DUPLICATE KEY UPDATE”, “REPLACE INTO … SELECT”, and “CREATE TABLE … SELECT”.
In addition to the improvements to bulk fetching, TokuDB 7.5 also introduced the ability to perform read free replication. Prohaska explains the feature saying:
When row based replication is used, an image of the row is written into the binary log for write, delete, and update operations. The slave can use the row image from the binary log to avoid reading the row from the table. This read free replication design can reduce the I/O load on the slave significantly.
Tokutek has a benchmark claiming a 20x performance improvement when using read free replication. However, Tokutek cautions that there are a few conditions that must be first met before this feature can be enabled.
TokuMX 2.0 comes with its own set of welcomed improvements. First and foremost is the use of the Ark consensus algorithm for replication. The Ark algorithm was created by Tokutek to provide better replication guarantees than the default MongoDB algorithm. Specifically, Ark ensures that writes acknowledged with a majority Write Concert will never be rolled back by a later failover.
In addition to Ark, TokuMX 2.0 adds two other important features. First, sharding support has been added to their partitioned collections. Partitioned collections allow for efficient deletion of ranges of data which is particularly important when aging time series data. Second, TokuMX now supports all geospatial indexing and query features of MongoDB 2.4.
Finally TokuMX 2.0 adds fast update support which Tokutek claims can enable a 10x performance improvement. In previous versions of TokuMX, updates performed a query to read the existing document, then made changes to the relevant indexes. With fast updates, if an update does not change any indexed fields, the preliminary query can be completely avoided.
TokuDB and TokuMX are based on Tokutek’s Fractal Tree index technology. TokuDB is available under the GPLv2 license and TokuMX is available under the AGPL license.