BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

ElasticSearch Gets Better Aggregation, Adds Groovy for Scripting

by Roopesh Shenoy on Jul 29, 2014 |

ElasticSearch 1.3.0 has been released. Based on Lucene 4.9, it comes with better aggregation features, some security and scripting improvements, several index performance improvements and more.

Dynamic Scripting was disabled by default in 1.2 for security reasons. It is now enabled by default for sandboxed languages. Groovy, which also gets sandboxing, now replaces MVEL as the language of choice for Scripting. MVEL is deprecated and will be removed in 1.4.

Other new Scripting features -

  • Lucene Expressions library integrated into the core as an experimental feature. This provides a mechanism to compile JavaScript search expressions to bytecode, allowing for very high execution speed. Early benchmarks show 4x-6x speed improvements over Groovy scripting. The speed comes with some limitations - you can access only numeric fields, stored fields are not available and sparse fields (fields for which some documents don’t contain a value) will have a default value 0; besides this can be used only for search and not document updates
  • Scripts (including Search templates) can now be saved in the special .scripts index instead of the config directory in every node. This allows for “user-defined” queries more easily since user can update the script/template with a new query.

New Aggregation features -

  • Field Collapsing/Combining- this is useful for collapsing a group of values down to a single (or fixed) number of entries and suppressing duplicate documents
  • Percentile Ranks Aggregation – this experimental feature shows percentage of observed values which are below a certain value
  • Geo bounds aggregation on geo_point values for a field – this provides a bounding box covering all the values (for example, a sales region instead of individual sales cities)
  • Better performance of terms aggregation on high cardinality fields
  • collect_mode lets you define whether the parent-level aggregations get pruned before child aggregations are calculated first (breadth_first). In most queries, the depth-first default value (all branches are expanded in one-depth in first pass before being pruned) would give better results but for fields with many unique terms and small number of required results, breadth_first would be more efficient.

There are also several performance improvements both in indexing and I/O. Notably, Lucene 4.9 brings with it better compression, which improves both disk and memory usage. There are several resiliency improvements as well.

The release also has a few breaking changes – for example, JSONP is now disabled by default for security reasons. Read the release notes for the full list of changes. 

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT