Apache Drill Included in MapR Latest Distribution Release

| by Alex Giamas Follow 9 Followers on Sep 30, 2014. Estimated reading time: 1 minute |

MapR recently announced including Apache Drill in its latest release of MapR distribution. Apache Drill is the open source version of Google’s Dremel. Dremel is the infrastructure on which BigQuery is based upon. Drill is offering a low latency SQL-on-Hadoop interface. While this puts it in the same space as several other technologies around Hadoop, Drill has some unique characteristics setting it apart from other SQL-on-Hadoop technologies.

For one, Drill is fully ANSI SQL compliant. But more importantly, Drill is built around the “data exploration first” principle. Drill supports out of the box several SQL and NoSQL data sources, self-describing data like Avro, Parquet or HBase tables and even complex nested JSON structured data sources. And if SQL interface is not enough, users can connect analytics software to a Drill data source through the ODBC connector it provides.

Data Exploration First means that Drill can query unstructured complex JSON structures without flattening or converting them into a fixed schema. In contrast to schema before read SQL-On-Hadoop technologies like Apache Hive, Apache Drill lets users exploring schema-less data. Drill’s query engine can discover the data schema and prepare query plans matching the SQL queries applied to the dataset.

Apache Drill can be used alongside MapReduce jobs complementing it rather than substituting it. When there is need for quick data exploration and interactive analysis in unstructured data Drill can definitely help. Hadoop MapReduce is still the platform of choice for batch processing in the Big Data world.

MapR has been offering a developer preview of Apache Drill in earlier versions of its Hadoop distribution but Drill 0.5.0, shipped on September 12 is the first beta-quality release. Apart from Drill, there are several other technologies offering SQL-on-Hadoop each with its strengths and weaknesses. MapR 4.0.1 release offers four more different technologies in this aspect, namely Apache Hive, Apache Spark SQL, Cloudera Impala and Vertica integration.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread


Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you