BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Elasticsearch 1.0.0 released

Elasticsearch 1.0.0 released

This item in japanese

Bookmarks

Elasticsearch released version 1.0.0 of its self-titled, open-source analytics tool. Elasticsearch is a distributed search engine which allows for real-time data analysis in big-data environments. It's based on the Apache Lucene text search engine and exposes its functionality via a ReST API. Besides directly accessing Elasticsearch via HTTP, there are client libraries for various languages like Java, JavaScript, Python and more. It also allows for integration in Apache Hadoop environments. Elasticsearch is already used by companies that deal with vast amounts of data like GitHub, Foursquare or SoundCloud.

The basic features of Elasticsearch focus around scalability, high-availability and real-time analysis. Data entered into the search engine is immediately indexed, replicated in the cluster and ready to be analyzed.

  • Scalability: Elasticsearch is designed to work in clustered environments. As soon as a node is started, it automatically looks for other nodes in the network and connects to them. Indices are organized in shards and distributed over the cluster. Searching an index therefore is an distributed operation which runs in parallel over all cluster nodes. If there is the need for more performance, additional nodes just have to be added to cluster and the shards will reorganize automatically.
  • Availability: Database shards are not only used for horizontal scaling but also for availability reasons. For each shard there is a replica shard which is stored on a different cluster node, so no data is lost if a node goes down. Malfunctioning nodes are detected by Elasticsearch and removed from the cluster. After removing a failing node, the shards are reorganized to optimize with regard to scalability and resilience.
    To support full cluster restarts, all meta-data needed by Elasticsearch can be persisted onto various storage-types. Data is stored with the help of so-called gateways which currently support local storage and shared file-systems.
  • Real-time: Elasticsearch is schema-free and allows for arbitrary JSON documents to be indexed. The structure of the documents is analyzed and even some data-types like timestamps are automatically detected. By default, all fields contained in the documents are indexed and searchable. Besides simple full text search, facets - analytical functions providing bucketing (date ranges, distances, histograms and others) and metrics (sum, average, stats and others) - can be applied to the index immediately.

 

New Features in Elasticsearch 1.0.0

Version 1.0.0 comes with various functional enhancements and changes to the API to make Elasticsearch more intuitive and powerful to use. Amongst others, the functional enhancements include new ways to backup and restore indices, analyze data and make Elasticsearch more resilient:

  • Snapshot/Restore: With the new version there is a simple API to produce snapshots of a complete cluster to create backups. The state of an Elasticsearch cluster including meta-data and indices can be stored in a snapshot repository. Usually the repository resides on a shared file-system and can hold an arbitrary number of snapshots. In case of issues that can't be handled by built-in fail-over and resilience mechanisms, the cluster can be recreated to any snapshot state in the repository.
  • Aggregations: Aggregations provide even more powerful ways to analyze existing data than the existing facets in pre 1.0.0 versions. While facets only provide the mere results for analytical functions (e.g. the number of stores within a certain distance), aggregation retain which documents actually were found by a certain query and make it possible to use the resulting set of documents as input for a new query (e.g. the average sales volume by quarter for all stores within a certain distance).
  • Circuit Breaker: Circuit breakers will be added to prevent operating or runtime errors from causing serious harm to the search index. The first safeguard that was added in Elasticsearch 1.0.0 is one that monitors free memory and estimates the amount of memory required by search or analysis operations. If an operation would need more than the available amount of memory it is blocked and won't create OutOfMemory-exceptions. More circuit breakers will be implemented in future releases.

 

Elasticsearch used the major-version change to tidy up the existing API, also accepting to break backwards compatibility. Before upgrading to version 1.0.0, users are asked to backup all data and read the list of breaking changes.

Elasticsearch also provides additional tools to deal with data capturing and analysis. Together with Logstash and Kibana, Elasticsearch forms the ELK-stack to parse log-files and other sources of time-related and analyze and visualize them in various ways.
It's also possible to purchase professional support via Elasticsearch's commercial branch.

Rate this Article

Adoption
Style

BT