Apache HBase 1.3 Ships with Multiple Performance Improvements

| by Alexandre Rodrigues Follow 1 Followers on Jan 30, 2017. Estimated reading time: 2 minutes |

Apache HBase 1.3.0 was released mid-January 2017 and ships with support for date-based tiered compaction and improvements in multiple areas, like write-ahead log (WAL), and a new RPC scheduler, among others. The release includes almost 1,700 resolved issues in total.

HBase is often used in time series applications, directly or through projects, such as OpenTSDB. In time series applications, data is usually written in sequential writes by the time of data arrival, and data is queried in limited time ranges based on look-back time windows, causing recently written data to be queried more often than an older one.

The date-based tiered compaction support shipped in HBase 1.3.0 is beneficial for this set of use cases, where data is infrequently deleted or updated and recent data is scanned more often than an older one.

Records time-to-live (TTL) can be easily enforced with this new compaction strategy; records that have reached the expiry time will be dropped, when compacting the existing store files into a single bigger store file.

Modeled after Google BigTable, HBase columnar-based NoSQL divides the data across multiple regions, each region being defined as a start and end row in the key space. HBase region servers have the responsibility of a number of regions and, when regions grow too big, they can be split into two and shuffled around to other region servers, to evenly distribute the load across all nodes.

By default, each region server has one WAL and all operations on a server's regions are written to that single WAL. Improved multiple WAL support allows for a higher write throughput, acceleration of replication speed and reduction of latency in synchronous writes. By default, the multiple WAL feature is provided with three region grouping strategies for WAL allocation: “identity” for one WAL for each region, “bounded” where regions are mapped to a WAL following a round-robin algorithm, and “namespace” where regions of tables belonging to different namespaces are mapped into different WAL files. Performance tests report 20% improvements in average latency when running on pure SATA disks, and 40% when WAL files are written to SATA-SSD disks.

The new RPC scheduler is based on CoDel algorithm and is used to prevent long standing call queues caused by discrepancy of request rate and available throughput, bounded by IO. The algorithm performs an active queue management with controlled delay, and considers the minimum delay in the queue against a defined threshold. Whenever the minimum is over the threshold, calls are dropped in order to meet a more favourable minimum delay.

Other improvements include a throughput controller for disk flushers to avoid huge IO spikes. The enhancements also contribute to a better performance of Apache Phoenix, OpenTSBD, and other software projects that rely on HBase engine for data persistence and fast lookup.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread


Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you