Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Lucid Imagination Releases Performance Monitoring Utility for Apache Lucene

Lucid Imagination Releases Performance Monitoring Utility for Apache Lucene

This item in japanese

Lucid Imagination, a commercial company working with the Apache Lucene and Solr search engine libraries, has introduced a new monitoring product called LucidGaze. The product is a fully instrumented version of Lucene for developers. Performance data can be printed to a log file, stored in a round-robin database, or made available through a Java API. If the round-robin database method is used then the RRD4j library provides a standalone Swing application that you can use to read and process the database.

Installation is straightforward. The software is supplied as a .jar file which acts as a drop in replacement for the Lucene .jar. To install it a developer simply switches lucene-core.2.4.1.jar for lucene-core-gaze.2.4.1.jar on their application's classpath. As such, developers need make no changes to the source code of their application, and could potentially also use the product in situations where the source for the application to be monitored is unavailable.

LucidGaze offers developers a range of analytics for looking at how well searches are transformed into document retrieval operations, how effectively user input is analyzed and decomposed for processing by the index, and how text is processed and indexed. The tool uses 5 different monitors to collect statistics:

  1. AnalysisStats: Analyzers, TokenFilters, TokenStreams, and Tokenizers, and which Analyzers were used to produce the TokenStream for a particular field
  2. DocumentStats: Total number of documents indexed, as well as fields in the index
  3. IndexStats: Activities and behaviour of IndexReaders and IndexWriters, such as visibility into each instance, tracking calls for each of their relevant methods, buffer and memory usage, as well as average add and commit times
  4. SearchStats: Query operations, searcher performance and parsing times, method calls statistics, and most commonly executed queries.
  5. StoreStats: Lucene Storage directory instances.

The overhead of running with full monitoring is considerable. During a conversation Grant Ingersoll, a member of Lucid Imagination's technical team, suggested a figure of around 10-15%. It is however possible to reduce the overhead by configuring which statistics are collected and whether they should be persisted.

InfoQ also talked to Ingersoll about some typical applications for LucidGaze. One he highlighted was a common developer error when working with Lucene: an apparent memory leak caused by the developer failing to close an IndexReader. LucidGaze collects data on the number of currently open Indexeaders, the total number of IndexReader#reopen() calls and which of these resulted in a new instance of an IndexReader, along with the total estimated RAM that all IndexReaders in use in the JVM are consuming. These stats could be very useful when tracking down a memory leak caused by failing to close a reader - at its most basic, if you are expecting to be using two IndexReaders and you have ten in memory then you have a leak somewhere. A second common use case would be when looking at re-indexing strategies during volume testing for a high volume (i.e. lots of document creates and deletes) site. Lucene's index database is composed of a number of separate "segments" each stored in an individual file. When you add documents to the index, new segments may be created. You can compact the database and reduce the number of segments, thereby speeding up query times, but there is an overhead for doing so and working out the best strategy tends to involve a lot of trial and error. LucidGaze offers stats for the number of new index segments created, as well as the number of segment merges that have occurred and the average time they took, helping developers tune their implementation. The tool can also be used to look at specific issues encountered during volume testing - isolating long running queries that are consuming an unfair share of resources, or pinpointing specific fields or documents that are causing a processing bottleneck.

The product is offered as a free, though closed-source, download from the Lucid Imagination web site. At present it supports Lucene 2.4.1 only, though Lucid Imagination have stated that they may make other versions available if there is sufficient demand.

Rate this Article