BT

Your opinion matters! Please fill in the InfoQ Survey!

LinkedIn Engineering Releases SenseiDB 1.0.0

| by Kostis Kapelonis Follow 0 Followers on Mar 19, 2012. Estimated reading time: 1 minute |

A note to our readers: As per your request we have developed a set of features that allow you to reduce the noise, while not losing sight of anything that is important. Get email and web notifications by choosing the topics you are interested in.

LinkedIn Engineering has released as open source SenseiDB, a distributed, semi-structured database. SenseiDB is the technology behind the search infrastructure in LinkedIn and powers the LinkedIn homepage, LinkedIn Signal and other search related features (e.g. people/company search). SenseiDB was developed in-house for the needs of the company and is now released as open source under the Search, Network, Analytics project umbrella.

SenseiDB is a NoSQL database focused on high update rates and complex semi-structured search queries. Users familiar with Lucene and Solr will recognize a lot of concepts behind SenseiDB. SenseiDB is deployed in clusters of multiple nodes, where each node can contain N shards of data. Nodes are managed via Apache Zookeeper which keeps the current configuration and transmits any changes (e.g. topology modifications) to the whole group of nodes. A SenseiDB cluster also requires a schema which defines the data model that will be used.

Getting data into the SenseiDB cluster happens only via Gateways (there is no "INSERT" method). Each cluster is connected to a single gateway. This is one of the critical points to understand, since SenseiDB does not handle itself Atomicity and Isolation. Those should be enforced externally at the gateway level. The gateway must make sure that the data stream behaves in an expected manner. Built-in gateways are:

Custom Gateways can also be implemented by the application developer. An example gateway is provided which gets its data from Twitter updates.

With the input datastream in place feeding data into the cluster, SenseiDB allows for faceted querying according to the defined schema. A REST API is offered for this purpose that can be accessed by any HTTP client. This API is inspired by ElasticSearch's Query DSL. SenseiDB also comes with wrappers for this API in Java and Python, with a Ruby version to follow soon.

Finally SenseiDB offers BQL (or Browse Query Language) as an alternative method of querying. BQL is an SQL-like language (containing only SELECT statements at the moment) which can be used to query a SenseiDB in a more convenient way. A graphical web console is provided as part of the cluster installation for inspecting and debugging BQL queries.

For more extensive information, see the documentation, the Javadocs and the Wiki. The source code is hosted on GitHub.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Old? by Louis Goddard

Didn't this happen back in January?

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

1 Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT