InfoQ

News

Lucene 2.2: Payloads, Function queries, and more speed

Posted by Ryan Slobojan on Jul 06, 2007

Community
Java
Topics
Search ,
Open Source
Tags
Lucene
Lucene Java 2.2 is now available. Lucene is a high-performance, full-featured text search engine library written entirely in Java. There are several new features in this release, including:
  • Payloads - Allows you to associate arbitrary binary data with any term in the index
  • Function queries - Gives more control over how document scores are calculated (Incorporated from Solr)
  • "Point-in-time" searching over NFS - Brings snapshot-like functionality to NFS
  • New pre-analyzed field API - Lets you handle pre-analyzed Document fields without dummy analyzer code
  • Public Maven releases - The latest release of all Lucene modules are now available through the Maven repository

InfoQ spoke with Grant Ingersoll, a committer and Project Management Committee (PMC) member for the Lucene project, to learn more about this release. During the discussion, he asked InfoQ to make it clear that his views and comments are his alone, and are not the official views of the Lucene PMC.

InfoQ learned that the 2.2 release of Lucene marks a shift towards a shorter, quarterly release cycle. Ingersoll believes that these more frequent releases will introduce several benefits, including making bug fixes and new features available to the community more rapidly. The release process has also been streamlined with Maven support improved so that future releases will be available more quickly to Maven users.

InfoQ asked Ingersoll to describe the Payloads feature in more detail, and he said:

Payloads are a new feature to allow the storage of information in the index on a term by term basis. For instance, when indexing web pages, it may be useful to store extra information about a particular word, such as an associated URL or weighting factor based on some analysis of the text. In more advanced applications, it might be useful to store the part of speech of a word in order to score nouns as being more important than other parts of speech. My talk at ApacheCon Europe this year has a few slides on payloads [for those that] are interested.
He also described the new Function queries which originated in Solr as:
The new search function package (org.apache.lucene.search.function) allows developers to use the actual content of a field in scoring a document. For instance, if you stored latitude and longitude information in fields on a document, you could then use the information in these fields to affect the ranking of a document. That is, if you were doing a search for Starbucks, you could rank those locations nearer to the user (assuming you know their location) higher in the results than those farther away. Another example might be to use price or margin information to affect the ranking (i.e. score products higher that have bigger margins for your company. Not saying I agree with this ethically, but it can be done)

Ingersoll was then asked what users could expect from the next release of Lucene. He indicated that there will be significant improvements in indexing performance as a result of some new memory management techniques led by Michael McCandless. He also mentioned that the recent releases of Lucene have added a number of performance enhancements, and that users will want to try them out for themselves. Finally, Ingersoll noted that Java 5 support and more flexibility in the indexing process are potential future features of Lucene.

A full changelog is available, listing all of the bugfixes, features and optimizations which are in this release. As with previous releases of Lucene, 2.2 is able to read and import indexes from previous versions of Lucene, however once converted the index is no longer readable by earlier versions of Lucene (e.g. 2.1).

MG4J by Vic C Posted Jul 7, 2007 7:47 AM
  1. Back to top

    MG4J

    Jul 7, 2007 7:47 AM by Vic C

    Which is faster: Solr, MG4j, Lucene, SQL text src?

    .V

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.