BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Compass: Simplifying and Extending Lucene to Provide Google-like Search

Compass: Simplifying and Extending Lucene to Provide Google-like Search

Bookmarks
The Compass project recently released the second milestone of the 1.1 cycle. Compass is a open source Java Search Engine Framework, enabling the power of search engine semantics to your application stack decoratively. Compass leverages the popular Lucene indexing engine and integrates with popular development frameworks like Hibernate and Spring. The Compass 1.1 M2 release includes:
  • Support Unmarshall: Allowing to disable Compass support for unmarshalling objects from the search engine. This feature enable Compass not to store extra information in the index, and use much less memory.
  • Dynamic Meta Data: Allows to define syntactic indexable content using scripting/expression languages like ognl, groovy, velocity, commons-el and commons-jexl.
  • Simplified API usage when working within a transaction, including @CompassContext support in Spring.
  • Support for the latest Hibernate and Spring versions.

InfoQ caught up with Shay Bannon, Compass founder, to talk about the 1.1 version. In regards to how Compass got started:

It all started when I made the horrible mistake of promising my wife to build her a recipe management software (or more precisely, a culinary knowledge based program) that I (creatively) named iCook. I also wanted to play with the new projects out there, especially Hibernate and Spring, so I chose them as the baseline for the application.

One of the major features for the application was usability, and I considered a google like search box allowing to search on all the application data (recipe, ingredient, article, lecture notes, ...) returning relevant search results one of its most important aspects. So, I went out and checked for a search engine implementation and the first one was Apache Lucene. While trying to implement the search feature, and after getting used to the more simplified development model in Spring and Hibernate, I was surprised to find out that I had to write a lot of boiler plate code in order to integrate Lucene with the application.

So, I dumped iCook for a while, and started to work on a project that aimed at simplifying the implementation of search related features within an application. The project turned out the be much bigger then I initially expected, and thought that other developers would benefit from what I have done, so I released it as an open source project and called it Compass.

And by the way, if you are wondering about iCook, my wife still waits for it :).

The conversation then moved on to Compasses' use of Lucene.  Banon commented that it is an amazing library.  He explained that features that Compass adds versus pure Lucene can be divided into two parts:

The first is enhancements Compass made within Lucene. Lucene is not transactional, and Compass adds transactional support into Lucene, allowing for ACID search engine operations. Compass comes with a local transaction manager and also integrates with other transaction managers like JTA and Spring. The second enhancement, which is closely related to the transactional support, is what I call "fast updates". Updating an already existing identifiable data within Lucene is a complex operation, and actually uses two different interfaces for it. Compass both simplifies the update operations, and executes it much faster then the equivalent typical Lucene based code. Another enhancement is automatic caching and cache invalidation of different Lucene objects, allowing for much faster searches, and relieving the developer from worrying about it.

The second part are features built on top of Lucene, and there are many of them. Let me try and focus on some of the major ones:

  • Compass introduces a mapping layer, allowing to map Resources (similar to Lucene Document), Xml, and Pojo into the search engine decoratively (using xml or annotations). This removes the need for the usual boiler plate code of mapping your domain model into Lucene.

  • Compass exposes a simple API, similar to your typical ORM framework API. This allows for users already familiar with ORM tools to feel right at home when using Compass.

  • The move to Compass from a Lucene based project, or with people that already have Lucene knowledge is very simple. First, Compass allows for low level operations using Resources, Properties and RSEM (Resource mapping). Also, when using Compass, people do not loose any of possible Lucene investments. Analyzers, Custom Queries and so on can be directly used or configured within Compass. Usually, the usage of them is much simpler, since for example, Analyzers can be configured and applied decoratively within the mapping definitions.

  • Compass Gps has several integration modules mostly with ORM tools. This allows for Compass to automatically index content stored in the database by utilizing existing ORM mappings and Compass mappings. It also allows (where possible) to automatically mirror changes made using the ORM framework API into the search engine. Some of the ORM tools supported are: Jpa (with simplified configuration for Hibernate and Toplink Essentials), Hibernate, Ojb, JDO, and iBatis.

  • Compass Spring is Compass integration with the Spring Framework. It was written trying to mimic Spring own implementation for existing ORM tools, and even expand on it. It comes with a LocalCompassBean factory, AOP support classes, support for new Spring 2 features like schema based configuration, @CompassContext injection (similar to Jpa one), and transaction management integration.

     

In terms of the driving forces behind 1.1 Banon noted that many are a result of the experiences introduced by Compass 1.0. Many times users wished to have syntactic indexed content built using their domain model but didn't necessarily want to map domain model properties. As a result Compass now includes with support for dynamic meta data that can be created with scripting/expression languages (ognl, groovy, velocity, commons jexl, commons el) to build indexable content.  Banon continued:

Another important feature for users was using Compass only to index objects, and use Resources when displaying the results. This means that Compass does not need to store extra data within the index, and reduces Compass memory consumption.

Also, the release introduces simplified API usage, where Compass can automatically join existing on going transactions, thus does not requiring your typical begin ... rollback/commit operations. The feature also allowed for the injection of Compass sessions within Spring application using @CompassContext annotation.

XA support and possibly JCA support will be added in 1.1. As usual there are also the bug fixes, better documentation, and catchup with the new versions of the many libraries Compass integrates with.

We then asked Banon about innovative apps he's seen built with Compass:

Actually, most projects that use Compass are considered innovative for the "business"/users. It never stops to amaze me the joy on the users face when you explain to them that they can use Google like queries within their application. Most times, applications do not have a search engine like search box, usually because of the complexity involved with implementing it, or the fact that people try to implement it using SQL. When users find out that they can enter a query, and get the relevant domain model objects relating to the query back, with their own specialized actions (for example, edit info on customer, or invest on a financial product), they immediately consider the application innovative. Most times going all the way and asking for the application to have a google like front page, with a single search box (and maybe a logo that changes its shape once in a while :) ). This type of applications are very innovative in terms of usability, and removes the typical complex navigation model current applications have.

Finally Banon indicated that there are still lots of areas left to explore with Compass post 1.1 including more enterprise support. The team is also considering adding other search related features such a crawler, clustering, and a administration console.

Rate this Article

Adoption
Style

BT