LinkedIn Engineering Releases SenseiDB 1.0.0
LinkedIn Engineering has released as open source SenseiDB, a distributed, semi-structured database. SenseiDB is the technology behind the search infrastructure in LinkedIn and powers the LinkedIn homepage, LinkedIn Signal and other search related features (e.g. people/company search). SenseiDB was developed in-house for the needs of the company and is now released as open source under the Search, Network, Analytics project umbrella.
SenseiDB is a NoSQL database focused on high update rates and complex semi-structured search queries. Users familiar with Lucene and Solr will recognize a lot of concepts behind SenseiDB. SenseiDB is deployed in clusters of multiple nodes, where each node can contain N shards of data. Nodes are managed via Apache Zookeeper which keeps the current configuration and transmits any changes (e.g. topology modifications) to the whole group of nodes. A SenseiDB cluster also requires a schema which defines the data model that will be used.
Getting data into the SenseiDB cluster happens only via Gateways (there is no "INSERT" method). Each cluster is connected to a single gateway. This is one of the critical points to understand, since SenseiDB does not handle itself Atomicity and Isolation. Those should be enforced externally at the gateway level. The gateway must make sure that the data stream behaves in an expected manner. Built-in gateways are:
Custom Gateways can also be implemented by the application developer. An example gateway is provided which gets its data from Twitter updates.
With the input datastream in place feeding data into the cluster, SenseiDB allows for faceted querying according to the defined schema. A REST API is offered for this purpose that can be accessed by any HTTP client. This API is inspired by ElasticSearch's Query DSL. SenseiDB also comes with wrappers for this API in Java and Python, with a Ruby version to follow soon.
Finally SenseiDB offers BQL (or Browse Query Language) as an alternative method of querying. BQL is an SQL-like language (containing only SELECT statements at the moment) which can be used to query a SenseiDB in a more convenient way. A graphical web console is provided as part of the cluster installation for inspecting and debugging BQL queries.
Martin Thompson Jul 27, 2014