BT

Querying Columnar Databases with Google Supersonic

by Abel Avram on Oct 16, 2012 |

Supersonic is a query engine library for columnar databases providing a set of data transformation primitives that Google advertises to be “ultra-fast” due to “heavy use of cache-aware algorithms, SIMD instructions and vectorised execution, allowing it to exploit the capabilities and resources of modern, hyper pipelined CPUs.” Some of its main features are:

  • Cache aware
  • Instruction pipelining
  • Using SIMD (Single Instruction Multiple Data)
  • Custom data structures
  • Failure handling
  • Supporting standard columnar operations
  • Specialized expressions

Supersonic supports a number of operations that can be applied to entire tables and can be composed into operation trees:

  • Aggregation: SUM, MIN, MAX, COUNT, CONCAT, FIRST, LAST
  • Compute – transforms an expression (more about expressions below) into an operation
  • Filter – filters the rows of a columnar table
  • Generate – creates a certain number of zero-column rows
  • Limit – limits the number of rows resulting from a previous operation
  • Sort – sorting the results of a previous operation

Unlike operations, expressions apply at row level, performing the actual computations on individual column values. They can also be composed in an expression tree. Some of the expressions are:

  • Terminal – leaf nodes containing primitive types such as ConstInt32, ConstBool, ConstDataType, RandInt32, etc.
  • Arithmetic – Plus, Minus, Multiply, etc.
  • Comparison – Equal, Less, Greater, IsOdd, etc.
  • Date/Time – Now, Day, Month, Year, Hour, Minute, Second, AddDays, etc.
  • Logical – And, Or, AndNot, Xor, Not
  • Control Flow – If, IsNull, IfNull, Case
  • Mathematical – Exp, Sin, Cos, Abs, Round, Floor, Trunk, Sqrt, Power, etc.
  • String – ToString, Concat, Length, Trim, etc.

Supersonic has been written written is C++, and does not have a built-in data storage format, but there is “strong intention” to create one. Data is currently kept in memory.

Supersonic Query Engine has been licensed under the Apache License 2.0. It can be downloaded from its Google Code site. The source includes a number of examples demonstrating the use of operations and expressions against columnar tables.

The included graphic has been posted by the Supersonic team and represents an operation tree with benchmark results, processing a table with 1M rows as following: a view is obtained in 60us or 16.7G rows/s, then filtered in 1.03 ms or 1M rows/s, followed by a computation done in 25 us or 41.2M rows/s, then the result is joined with another filter, the whole test taking 22.1 ms.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT