Apache Lucene and Lucene.Net – Full Text Search Servers - InfoQ

Ten years ago, relying on open source projects was unimaginable in most Windows shops. These days, .NET programmers are awakening to the world of enterprise class software developed and proven on the Java platform. Today we look at the popular Full Text search engines, Apache Lucene and Lucene.Net.

Apache Lucene and its port, Lucene.Net are battle-tested products used to provide search capabilities for big name sites such as Wikipedia, CNET, and Monster.com. With references like that, their capabilities and future are not in doubt.

Lucene is not a crawling search engine, nor does it automatically index content. The text of documents to be indexed have to extracted prior to loading into a Lucene index. The standard pattern for doing this is to instantiate an Analyzer, open an IndexWriter, and then add each document one by one. Once done, the index can be optionally optimized before it is closed and the changed committed. This process is probably more hands-on than developers are used to, but it does give you a lot of flexibility on what data is indexed.

Searching can be done via an object model, with the query built up term or term. Alternately, a plain text search string, perhaps entered by an end-user, can be parsed and executed. .NET developers using .NET 3.5 and later also have a third option, LINQ to Lucene. Their project page has a nice map between Lucene's search syntax and the corresponding LINQ to Lucene syntax.

If you want to try it out, Andrew Smith has an Introduction to Lucene.NET. And regardless if you choose the .NET or Java version, also take a look at Erik Hatcher's Lucene Intro.

InfoQ Software Architects' Newsletter

Apache Lucene and Lucene.Net – Full Text Search Servers

Write for InfoQ

Rate this Article

This content is in the Java topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter