Interview With Nick Lavezzo, Co-Founder of FoundationDB
FoundationDB is a database that provides ACID guarantees along with high performance and availability normally associated with NoSQL databases. In an InfoQ exclusive interview, we learn more about the project from one of the founders, Nick Lavezzo.
InfoQ: We read that FoundationDB uses two different kinds of nodes for reads and writes to get around the limitation claimed by CAP theorem to achieve ACID along with scalability - is that correct? Could you explain more about this architecture?
Nick: FoundationDB doesn’t get around the CAP theorem; it just takes the unconventional choice (for NoSQL) to maintain consistency during a network partition. But building a distributed, fully consistent database is hard, so, to make things simpler, we recruit different servers for different roles. Two of the most important roles are transaction servers and storage servers. Transaction servers are responsible for the conflict checking that occurs in order to maintain ACID guarantees. The storage servers store chunks of ordered key-value pairs and service reads and writes that have been approved by the transaction servers. Of course, on a single physical machine or a smaller cluster these roles can overlap so that one computer is doing more than one task. For more information check out a more detailed explanation on our website.
InfoQ: You mention FoundationDB and an ecosystem of "layers" will be available both as open source and commercial variants - are the sources already available somewhere for enthusiasts to see?
Nick: We have been focusing on building the core database, and haven't cleaned up and documented our internal layers enough to be released publicly yet (although we are providing access to them as requested by our alpha testers). Public repositories for the layers are something we expect to start as we head towards beta. These layers will serve both as excellent tools for exposing higher-level data models, and as excellent demonstrations of how easily one can build something powerful on top of FoundationDB's ordered key-value API. People that are interested in seeing an example of layer/application code right now though, can check out some examples here.
InfoQ: You mention a new language called Flow which is built on top of C++ and provides tooling - could you share some details about it?
Nick: We have introduced a new section on our website which explains Flow and the purpose behind it.
InfoQ: Has someone started using FoundationDB for building any applications?
Nick: Our testers have built applications on top of FoundationDB, but I don't think that's what you're asking. I'll assume you mean in applications in production, i.e. running their businesses on it. FoundationDB is currently in Alpha. Until we rolled-out our snapshot backup feature, a few weeks ago, we advised against anyone using it in production. It doesn't matter how fault tolerant a system is, if someone accidentally (or intentionally) deletes the data in the database, you need an external backup for a production application to recover from. Now that we have that functionality, we're working with some of our alpha testers on projects that are aimed at production use.
InfoQ: Does/Will FoundationDB have capabilities to allow global distribution of database nodes, similar to how Google Spanner works? If yes, how will it achieve this?
Nick: Yes. FoundationDB is designed to work with both local and cross-datacenter clusters. FoundationDB is aware of the topology of the network when run in a multi-datacenter configuration and will make intelligent decisions about, for example, storing replicas of data in different data centers. Like Google Spanner, FoundationDB is designed not as much to use all the globe’s data centers to create a single, global database as it is to create a database that efficiently runs in several nearby data centers. (Automatically moving data around the globe for fast access is possible for reading data but is a more difficult and application-specific job for writes. Neither Spanner nor FoundationDB tackle that job.)
How do we achieve it? I think the simplest way to explain it is that in any system like FoundationDB or Spanner, there is a complex layering of guarantees that, together keep the system correct. Spanner helps achieves this by building a single “global variable”, time, that they can make strong inferences about across all nodes in the cluster. For example, if computer A updates a value and computer B reads it, Spanner’s TrueTime can guarantee that B will see the changed value if B reads at a later time than A wrote. FoundationDB uses a different strategy: storing a small amount of information using an algorithm called Paxos (our “global variable”) and building up other guarantees and inferences from there, never relying on clocks.