BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News ElasticSearch Gets Better Aggregation, Adds Groovy for Scripting

ElasticSearch Gets Better Aggregation, Adds Groovy for Scripting

Lire ce contenu en français

Bookmarks

ElasticSearch 1.3.0 has been released. Based on Lucene 4.9, it comes with better aggregation features, some security and scripting improvements, several index performance improvements and more.

Dynamic Scripting was disabled by default in 1.2 for security reasons. It is now enabled by default for sandboxed languages. Groovy, which also gets sandboxing, now replaces MVEL as the language of choice for Scripting. MVEL is deprecated and will be removed in 1.4.

Other new Scripting features -

  • Lucene Expressions library integrated into the core as an experimental feature. This provides a mechanism to compile JavaScript search expressions to bytecode, allowing for very high execution speed. Early benchmarks show 4x-6x speed improvements over Groovy scripting. The speed comes with some limitations - you can access only numeric fields, stored fields are not available and sparse fields (fields for which some documents don’t contain a value) will have a default value 0; besides this can be used only for search and not document updates
  • Scripts (including Search templates) can now be saved in the special .scripts index instead of the config directory in every node. This allows for “user-defined” queries more easily since user can update the script/template with a new query.

New Aggregation features -

  • Field Collapsing/Combining- this is useful for collapsing a group of values down to a single (or fixed) number of entries and suppressing duplicate documents
  • Percentile Ranks Aggregation – this experimental feature shows percentage of observed values which are below a certain value
  • Geo bounds aggregation on geo_point values for a field – this provides a bounding box covering all the values (for example, a sales region instead of individual sales cities)
  • Better performance of terms aggregation on high cardinality fields
  • collect_mode lets you define whether the parent-level aggregations get pruned before child aggregations are calculated first (breadth_first). In most queries, the depth-first default value (all branches are expanded in one-depth in first pass before being pruned) would give better results but for fields with many unique terms and small number of required results, breadth_first would be more efficient.

There are also several performance improvements both in indexing and I/O. Notably, Lucene 4.9 brings with it better compression, which improves both disk and memory usage. There are several resiliency improvements as well.

The release also has a few breaking changes – for example, JSONP is now disabled by default for security reasons. Read the release notes for the full list of changes. 

Rate this Article

Adoption
Style

BT