Causal Consistency for Large Neo4j Clusters
Jim Webber, chief scientist at Neo4J Technology, explored how Neo4J implements causal consistency at QCon London 2017. The presentation included a high-level overview of Neo4J’s clustering architecture, its implementation of consensus using Raft, and a pattern called bookmarking used to achieve read-after-write consistency.
To divide and conquer the problems of clustering, Webber explained that Neo4J has two different roles for nodes: core and read. The core nodes in the cluster are used for writes and guarantee durability. The read nodes are read-only asynchronous replicas of the core cluster and are used for scaling under heavy read load.
In order to meet their durability guarantees, Webber explained that the core nodes implement the Raft consensus algorithm. Whenever a transaction is written, Raft logs it and then replicates it to all the other core nodes in the cluster. However, rather than waiting for the transaction to be fully replicated, Raft just waits for a majority vote, which is enough to guarantee the write.
Webber also explained the performance and resilience benefits of Raft. In terms of performance, by only waiting for majority replication, query latency can be reduced by blocking for a shorter period of time. From a resilience point of view, the core cluster will still be able to function even if some nodes fail, as long as the majority are still able to vote.
Webber also compared Raft to Paxos and explained that Raft was chosen due to being simpler and easier to implement. He believes this reduces the likelihood of bugs and improves the maintainability of the application.
Webber also explained that a graph database is typically a read-heavy database and that even during a write operation the graph must be read and traversed. This is why Neo4J typically has more read nodes than core nodes. Because these nodes are not involved in consensus commits, it means they are suitable for auto-scaling, and can easily be disposed or provisioned on demand.
Because transactions are replicated to the read nodes asynchronously, Webber was able to demonstrate a simple problem. Even with the durability guarantees of a write, if a user was to create and then immediately read data, then they may not be able to find it. This is because the data is eventually consistent, and may not yet have been replicated to the node that they are querying.
In order to address this problem of read-after-write consistency, Webber demonstrating a causal consistency pattern in Neo4J, which has been named bookmarking.
The first stage of bookmarking involves a write, where upon completion the corresponding transaction id will be returned to the client. The second stage is a read, where the client sends the transaction id in its query. By using this id, the node being read from will be able to block the transaction present.
Webber also demonstrated bookmarking with a code example, and emphasized how simple he believed it was to implement. In it, the client received an id which it then passed into the next query.