InfoQ

News

Apache Lucene and Lucene.Net – Full Text Search Servers

Posted by Jonathan Allen on Nov 06, 2008

Community
.NET,
Java
Topics
Search
Tags
Lucene

Ten years ago, relying on open source projects was unimaginable in most Windows shops. These days, .NET programmers are awakening to the world of enterprise class software developed and proven on the Java platform. Today we look at the popular Full Text search engines, Apache Lucene and Lucene.Net.

Apache Lucene and its port, Lucene.Net are battle-tested products used to provide search capabilities for big name sites such as Wikipedia, CNET, and Monster.com. With references like that, their capabilities and future are not in doubt.

Lucene is not a crawling search engine, nor does it automatically index content. The text of documents to be indexed have to extracted prior to loading into a Lucene index. The standard pattern for doing this is to instantiate an Analyzer, open an IndexWriter, and then add each document one by one. Once done, the index can be optionally optimized before it is closed and the changed committed. This process is probably more hands-on than developers are used to, but it does give you a lot of flexibility on what data is indexed.

Searching can be done via an object model, with the query built up term or term. Alternately, a plain text search string, perhaps entered by an end-user, can be parsed and executed. .NET developers using .NET 3.5 and later also have a third option, LINQ to Lucene. Their project page has a nice map between Lucene's search syntax and the corresponding LINQ to Lucene syntax.

If you want to try it out, Andrew Smith has an Introduction to Lucene.NET. And regardless if you choose the .NET or Java version, also take a look at Erik Hatcher's Lucene Intro.

No comments

Watch Thread Reply

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.