LinkedIn Open Sources IndexTank, a Customizable Indexing Engine

| by Abel Avram Follow 4 Followers on Dec 29, 2011. Estimated reading time: 1 minute |

A note to our readers: As per your request we have developed a set of features that allow you to reduce the noise, while not losing sight of anything that is important. Get email and web notifications by choosing the topics you are interested in.

LinkedIn has open sourced IndexTank, a document indexing engine that runs on the cloud and lets users customize the indexing process and tweak the results.

IndexTank was launched about a year ago and it was acquired by LinkedIn in October, and was recently open sourced. IndexTank is a cloud service similar to Google Custom Search running on top of Amazon Web Services, and providing websites the ability to index their own content that is later searchable by their visitors. IndexTank claims their users have complete control over what is indexed, when, and how the results are sorted. That means a website can promote at the top of search results the documents they prefer to show up first, and not relying on Google’s search algorithm.

Unlike many websites, IndexTank does not crawl web pages in order to index them, but rather the websites send data to be indexed to indexing engine. As a result, a document can be indexed right after its creation, providing live results. Also, the service is adds free.

IndexTank has three main components:

  • Index Engine – the engine indexes only simple text. PDF, MS Doc, and other document types need to be converted to text in order to be indexed.
  • API – a RESTful interface accessed via Java, Python, .NET, Ruby and PHP clients.
  • Nebulizer – a multitenant framework hosting an unlimited number of indexing engines running on an IaaS infrastructure.

IndexTank joins Zoie, a real-time search engine built on Apache Lucene, and open sourced by LinkedIn in 2008.

IndexTank claims they have attracted thousands of customers in one year, the most notable being Reddit, but the company was not yet on profit at the time of being acquired by LinkedIn.

The source code of IndexTank is available on GitHub: Index Engine, and API plus Nebulizer.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread


Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you