Apache Hive 1.0 Released, HiveServer2 Becomes Main Engine, Stable API Defined

Apache Hive has released version 1.0 of their project on February 6th, 2015. Originally planned as version 0.14.1, the community voted to change the version numbering to 1.0.0 to reflect the amount of maturity the project has reached. The next major release originally planned as 0.15.0 will be renamed to 1.1.0 instead. Two main changes in the release are the beginnings of defining a public API, and the removal of HiveServer1 in favor of HiveServer2.

The class HiveMetaStoreClient provides the core functionalities of Hive at a client level. Beginning with the 1.0.0 version, HiveMetaStoreClient becomes the public API, such that an effort is made to stabilize the interface of the class. Java programs can directly use this interface to perform various administrative and monitoring tasks not available through the query interface.

The HiveServer2 engine was contributed by Cloudera two and a half years ago and provides support for JDBC and ODBC, which are standard interfaces to connect to an SQL database. These interfaces allow programs to be written without a fixed dependency on a specific database. Instead, any supporting database can be configured to work with the program. JDBC is part of the standard library of Java.

Removing HiveServer1 means that the original command line interface of Hive will also be removed. It is replaced by Beeline, which is also part of Hive but is based on the general JDBC interface instead of the native Hive interfaces from HiveServer1.

Hive has originally been started by the Facebook Data Infrastructure Team in 2009. It provides an SQL-like query language called HiveQL, which is near SQL:2001 compliant. In version 0.13.0 Hive has added support for transactions and ACID compliance at the row level for inserts, updates, and deletes, but still lacks general support for transaction blocks that would group multiple operations into one transaction.

The future roadmap for Hive has been outlined in the Stinger initiative and includes faster speed, up to sub-second query times, full support for transactions and ACID compliance, SQL:2001 compliance, integration with machine learning and analytics, and support for other computing backends like Apache Tez or Apache Spark.

Apache Hive is one of a number of projects which provide SQL-like query languages to deal with massive amounts of data stored on Hadoop systems. Other examples are Cloudera Impala, which can also work on data provided by Hive. Other projects use a SQL-like language for formulating queries but are not a database per se. One example for this is Apache Drill, which uses columnar storage to quickly scan and index data on Hadoop.

InfoQ Software Architects' Newsletter

Follow us on

Rate this Article

This content is in the Enterprise Architecture topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter