Apache Solr: Lucene Based Server Provides Highly Scalable Enterprise Search

| by James Kao Follow 0 Followers on Jun 11, 2007. Estimated reading time: 2 minutes |

A note to our readers: As per your request we have developed a set of features that allow you to reduce the noise, while not losing sight of anything that is important. Get email and web notifications by choosing the topics you are interested in.

Apache Solr is a Lucene-based enterprise search server that delivers out-of-the-box indexing and query capabilities in a portable war file. Users interact with Solr via an HTTP interface, submitting content for indexing and making queries using XML documents and HTTP GET parameters. Solr also provides a master-slave index replication mechanism to allow query load to be distributed in a large-scale environment.

Solr was initially developed at CNET Networks and was donated to the Apache Software Foundation in 2006. It is currently used for search applications on several high-traffic public websites. Community reports have been good, with users reporting indices with several million documents performing quite well.

Solr's feature set is broken down into several subsystems:

  • Defines the field types and fields of documents
  • Dynamic Fields enables on-the-fly addition of new fields
  • Explicit types eliminates the need for guessing types of fields
  • External file-based configuration of stopword lists, synonym lists, and protected word lists
  • Many additional text analysis components including word splitting, regex and sounds-like filters
  • HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby)
  • Sort by any number of fields
  • Highlighted context snippets
  • Constant scoring range and prefix queries - no idf, coord, or lengthNorm factors, and no restriction on the number of terms the query matches.
  • Function Query - influence the score by a function of a field's numeric value or ordinal
  • Date Math - specify dates relative to "NOW" in queries and updates
  • Pluggable query handlers and extensible XML data format
  • Document uniqueness enforcement based on unique key field
  • Batches updates and deletes for high performance
  • User configurable commands triggered on index changes
  • Correct handling of numeric types for both sorting and range queries
  • Pluggable Cache implementations
  • Autowarming of cache in background (The most recently accessed items in the caches of the current searcher are re-populated in the new searcher, enabing high cache hit rates across index/searcher changes.)
  • Fast/small filter implementation
  • User level caching with autowarming support
  • Efficient distribution of index parts that have changed via rsync transport
  • Pull strategy allows for easy addition of searchers
  • Configurable distribution interval allows tradeoff between timeliness and cache utilization
Admin Interface
  • Comprehensive statistics on cache utilization, updates, and queries
  • Text analysis debugger, showing result of every stage in an analyzer
  • Web Query Interface w/ debugging output
Version 1.2 was released last week, adding several new features:
This is the first release since Solr graduated from the Incubator, bringing many new features, including CSV/delimited-text data loading, time based autocommit, faster faceting, negative filters, a spell-check handler, sounds-like word filters, regex text filters, and more flexible plugins.
A two part series of articles was also recently published on developerWorks that walk through the process of installing, configuring, using, and tuning Solr in more detail.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Clustered Indexes by Orion Letizi

There's been some talk recently about clustering Solr indexes with Terracotta. We've gotten Lucene clustered and we have a config module for it. It may also work with Solr. It's on our list of cool things to try.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

1 Discuss

Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you