Querying Columnar Databases with Google Supersonic
Supersonic is a query engine library for columnar databases providing a set of data transformation primitives that Google advertises to be “ultra-fast” due to “heavy use of cache-aware algorithms, SIMD instructions and vectorised execution, allowing it to exploit the capabilities and resources of modern, hyper pipelined CPUs.” Some of its main features are:
- Cache aware
- Instruction pipelining
- Using SIMD (Single Instruction Multiple Data)
- Custom data structures
- Failure handling
- Supporting standard columnar operations
- Specialized expressions
Supersonic supports a number of operations that can be applied to entire tables and can be composed into operation trees:
- Aggregation: SUM, MIN, MAX, COUNT, CONCAT, FIRST, LAST
- Compute – transforms an expression (more about expressions below) into an operation
- Filter – filters the rows of a columnar table
- Generate – creates a certain number of zero-column rows
- Limit – limits the number of rows resulting from a previous operation
- Sort – sorting the results of a previous operation
Unlike operations, expressions apply at row level, performing the actual computations on individual column values. They can also be composed in an expression tree. Some of the expressions are:
- Terminal – leaf nodes containing primitive types such as ConstInt32, ConstBool, ConstDataType, RandInt32, etc.
- Arithmetic – Plus, Minus, Multiply, etc.
- Comparison – Equal, Less, Greater, IsOdd, etc.
- Date/Time – Now, Day, Month, Year, Hour, Minute, Second, AddDays, etc.
- Logical – And, Or, AndNot, Xor, Not
- Control Flow – If, IsNull, IfNull, Case
- Mathematical – Exp, Sin, Cos, Abs, Round, Floor, Trunk, Sqrt, Power, etc.
- String – ToString, Concat, Length, Trim, etc.
Supersonic has been written written is C++, and does not have a built-in data storage format, but there is “strong intention” to create one. Data is currently kept in memory.
Supersonic Query Engine has been licensed under the Apache License 2.0. It can be downloaded from its Google Code site. The source includes a number of examples demonstrating the use of operations and expressions against columnar tables.
The included graphic has been posted by the Supersonic team and represents an operation tree with benchmark results, processing a table with 1M rows as following: a view is obtained in 60us or 16.7G rows/s, then filtered in 1.03 ms or 1M rows/s, followed by a computation done in 25 us or 41.2M rows/s, then the result is joined with another filter, the whole test taking 22.1 ms.
Shane Hastie on Distributed Agile Teams, Product Ownership and the Agile Manifesto Translation Program
Shane Hastie Apr 17, 2015