Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Apache Solr: Lucene Based Server Provides Highly Scalable Enterprise Search

Apache Solr: Lucene Based Server Provides Highly Scalable Enterprise Search

This item in japanese

Apache Solr is a Lucene-based enterprise search server that delivers out-of-the-box indexing and query capabilities in a portable war file. Users interact with Solr via an HTTP interface, submitting content for indexing and making queries using XML documents and HTTP GET parameters. Solr also provides a master-slave index replication mechanism to allow query load to be distributed in a large-scale environment.

Solr was initially developed at CNET Networks and was donated to the Apache Software Foundation in 2006. It is currently used for search applications on several high-traffic public websites. Community reports have been good, with users reporting indices with several million documents performing quite well.

Solr's feature set is broken down into several subsystems:

  • Defines the field types and fields of documents
  • Dynamic Fields enables on-the-fly addition of new fields
  • Explicit types eliminates the need for guessing types of fields
  • External file-based configuration of stopword lists, synonym lists, and protected word lists
  • Many additional text analysis components including word splitting, regex and sounds-like filters
  • HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby)
  • Sort by any number of fields
  • Highlighted context snippets
  • Constant scoring range and prefix queries - no idf, coord, or lengthNorm factors, and no restriction on the number of terms the query matches.
  • Function Query - influence the score by a function of a field's numeric value or ordinal
  • Date Math - specify dates relative to "NOW" in queries and updates
  • Pluggable query handlers and extensible XML data format
  • Document uniqueness enforcement based on unique key field
  • Batches updates and deletes for high performance
  • User configurable commands triggered on index changes
  • Correct handling of numeric types for both sorting and range queries
  • Pluggable Cache implementations
  • Autowarming of cache in background (The most recently accessed items in the caches of the current searcher are re-populated in the new searcher, enabing high cache hit rates across index/searcher changes.)
  • Fast/small filter implementation
  • User level caching with autowarming support
  • Efficient distribution of index parts that have changed via rsync transport
  • Pull strategy allows for easy addition of searchers
  • Configurable distribution interval allows tradeoff between timeliness and cache utilization
Admin Interface
  • Comprehensive statistics on cache utilization, updates, and queries
  • Text analysis debugger, showing result of every stage in an analyzer
  • Web Query Interface w/ debugging output
Version 1.2 was released last week, adding several new features:
This is the first release since Solr graduated from the Incubator, bringing many new features, including CSV/delimited-text data loading, time based autocommit, faster faceting, negative filters, a spell-check handler, sounds-like word filters, regex text filters, and more flexible plugins.
A two part series of articles was also recently published on developerWorks that walk through the process of installing, configuring, using, and tuning Solr in more detail.

Rate this Article