Apache Lucene and Lucene.Net – Full Text Search Servers
Ten years ago, relying on open source projects was unimaginable in most Windows shops. These days, .NET programmers are awakening to the world of enterprise class software developed and proven on the Java platform. Today we look at the popular Full Text search engines, Apache Lucene and Lucene.Net.
Apache Lucene and its port, Lucene.Net are battle-tested products used to provide search capabilities for big name sites such as Wikipedia, CNET, and Monster.com. With references like that, their capabilities and future are not in doubt.
Lucene is not a crawling search engine, nor does it automatically index content. The text of documents to be indexed have to extracted prior to loading into a Lucene index. The standard pattern for doing this is to instantiate an Analyzer, open an IndexWriter, and then add each document one by one. Once done, the index can be optionally optimized before it is closed and the changed committed. This process is probably more hands-on than developers are used to, but it does give you a lot of flexibility on what data is indexed.
Searching can be done via an object model, with the query built up term or term. Alternately, a plain text search string, perhaps entered by an end-user, can be parsed and executed. .NET developers using .NET 3.5 and later also have a third option, LINQ to Lucene. Their project page has a nice map between Lucene's search syntax and the corresponding LINQ to Lucene syntax.