ElasticSearch Gets Better Aggregation, Adds Groovy for Scripting

ElasticSearch 1.3.0 has been released. Based on Lucene 4.9, it comes with better aggregation features, some security and scripting improvements, several index performance improvements and more.

Dynamic Scripting was disabled by default in 1.2 for security reasons. It is now enabled by default for sandboxed languages. Groovy, which also gets sandboxing, now replaces MVEL as the language of choice for Scripting. MVEL is deprecated and will be removed in 1.4.

Other new Scripting features -

Lucene Expressions library integrated into the core as an experimental feature. This provides a mechanism to compile JavaScript search expressions to bytecode, allowing for very high execution speed. Early benchmarks show 4x-6x speed improvements over Groovy scripting. The speed comes with some limitations - you can access only numeric fields, stored fields are not available and sparse fields (fields for which some documents don’t contain a value) will have a default value 0; besides this can be used only for search and not document updates
Scripts (including Search templates) can now be saved in the special .scripts index instead of the config directory in every node. This allows for “user-defined” queries more easily since user can update the script/template with a new query.

New Aggregation features -

Field Collapsing/Combining- this is useful for collapsing a group of values down to a single (or fixed) number of entries and suppressing duplicate documents
Percentile Ranks Aggregation – this experimental feature shows percentage of observed values which are below a certain value
Geo bounds aggregation on geo_point values for a field – this provides a bounding box covering all the values (for example, a sales region instead of individual sales cities)
Better performance of terms aggregation on high cardinality fields
collect_mode lets you define whether the parent-level aggregations get pruned before child aggregations are calculated first (breadth_first). In most queries, the depth-first default value (all branches are expanded in one-depth in first pass before being pruned) would give better results but for fields with many unique terms and small number of required results, breadth_first would be more efficient.

There are also several performance improvements both in indexing and I/O. Notably, Lucene 4.9 brings with it better compression, which improves both disk and memory usage. There are several resiliency improvements as well.

The release also has a few breaking changes – for example, JSONP is now disabled by default for security reasons. Read the release notes for the full list of changes.

InfoQ Software Architects' Newsletter

Follow us on

Rate this Article

This content is in the Database topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter