BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Facebook Announces Apollo, a New NoSQL Database for On-line Low Latency Storage

by Charles Humble on Jun 13, 2014 |

Speaking at QCon New York on Wednesday Jeff Johnson, from the core data group at Facebook, announced Apollo, Facebook’s Paxos-like NoSQL database. Written in C++11 on top of the Apache Thrift 2 RPC framework, Apollo is a hierarchical storage system where all the data is split into shards, very much analogous to region servers in HBase. The sweet-spot for it, Johnson explained, is on-line low latency storage - in particular Flash and in-memory.

As distinct from a document oriented, or key value store, Apollo is about modifications to data structures, allowing you to represent maps, queues, trees and so on, as well as key values. Within the system individual pieces of data are quite small - a range of between 1 byte and 1MB, with a total size anywhere from 1MB to 10+PB. It supports anything from a minimum of three servers to thousands.

Each Shard has four components. The first is a quorum consensus protocol which is based on Raft, a strong leader consensus protocol from Stanford. Johnson explained that one of the things that his team really like about Raft is that the leader failure recovery is really well defined, as is the quorum view change. That said, he suggested, it isn’t really simpler to work with than multi-paxos:

We’ve had to do tons and tons and stuff - everything from allowing you to asynchronous write and read from disk to trying to deal with situations when followers are getting behind because there’s other stuff going on on the server or the disk is slow, corruption detection, and so on

The second component is storage. At the time of writing the primary storage is based RocksDB, a Key/Value store that builds on top of Google’s LevelDB. Whilst it is a Key/Value store Facebook are using it to emulate other data structures. Apollo is designed to be storage agnostic and the team are also working on adding support for MySQL as an alternative storage engine.

The third component is a Client API with read() and write() methods. Every operation that Apollo performs at a Shard level is atomic, so you express pre-conditions and if these are satisfied it returns the reads or writes. For example this code:

read(conditions : {map(m1).contains(x)}, 
reads : {deque(d2).back()})

Says “If the map m1 contains the value x then return the value that is on the back of the d2 deque.” 

You can combine together any number of conditions and any number of reads.

Writes are very similar and again allow you to express conditions:

write(conditions : {ver(k1) == v}, reads : {}, 
writes : {val(k1) := x})

The final of the four shard components are Fault Tolerant State Machines (FTSMs). These are primarily used by the system code but can also be used for user code. Each FTSM is owned by a shard so that, for example, in a shard of three machines all of them will be executing the same code at the same time. They are able to access the persistent storage that is local to each machine. Most importantly if one node dies the code continues to execute in a proper order that all the nodes agree on.

Amongst other things the state machines are used for load balancing, data migration, shard creation and destruction, and co-ordinating cross-shard transactions. State machines can have external side effects, for example they can send RPC requests to remote machines, but whenever they make a change to persistent state they have to submit it to Raft to get all the servers to agree.

Apollo isn't currently being used in production at Facebook, but the firm is looking at using it to replace some memcahced use cases, and Johnson made clear that Facebook makes very significant use of memcahced.  "More generally," Johnson told InfoQ "we’re looking into various in-memory storage use cases at Facebook, either new ones or replacing some existing ones, by comparing side-by-side with existing systems."  

The company is also looking at using Apollo as a reliable queuing system for outgoing Facebook messages to iOS, Android and carriers via SMS, and also potentially for faster analytics.

Apollo is still in development and hasn’t been open-sourced though Johnson did state that doing so was something Facebook were looking into and would like to do. Johnson's presentation is currently available to QCon New York attendees and will be published to everyone via InfoQ in due course.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT