BT

LinkedIn Engineering Releases SenseiDB 1.0.0

by Kostis Kapelonis on Mar 19, 2012 |

LinkedIn Engineering has released as open source SenseiDB, a distributed, semi-structured database. SenseiDB is the technology behind the search infrastructure in LinkedIn and powers the LinkedIn homepage, LinkedIn Signal and other search related features (e.g. people/company search). SenseiDB was developed in-house for the needs of the company and is now released as open source under the Search, Network, Analytics project umbrella.

SenseiDB is a NoSQL database focused on high update rates and complex semi-structured search queries. Users familiar with Lucene and Solr will recognize a lot of concepts behind SenseiDB. SenseiDB is deployed in clusters of multiple nodes, where each node can contain N shards of data. Nodes are managed via Apache Zookeeper which keeps the current configuration and transmits any changes (e.g. topology modifications) to the whole group of nodes. A SenseiDB cluster also requires a schema which defines the data model that will be used.

Getting data into the SenseiDB cluster happens only via Gateways (there is no "INSERT" method). Each cluster is connected to a single gateway. This is one of the critical points to understand, since SenseiDB does not handle itself Atomicity and Isolation. Those should be enforced externally at the gateway level. The gateway must make sure that the data stream behaves in an expected manner. Built-in gateways are:

Custom Gateways can also be implemented by the application developer. An example gateway is provided which gets its data from Twitter updates.

With the input datastream in place feeding data into the cluster, SenseiDB allows for faceted querying according to the defined schema. A REST API is offered for this purpose that can be accessed by any HTTP client. This API is inspired by ElasticSearch's Query DSL. SenseiDB also comes with wrappers for this API in Java and Python, with a Ruby version to follow soon.

Finally SenseiDB offers BQL (or Browse Query Language) as an alternative method of querying. BQL is an SQL-like language (containing only SELECT statements at the moment) which can be used to query a SenseiDB in a more convenient way. A graphical web console is provided as part of the cluster installation for inspecting and debugging BQL queries.

For more extensive information, see the documentation, the Javadocs and the Wiki. The source code is hosted on GitHub.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Old? by Louis Goddard

Didn't this happen back in January?

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

1 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT