James Phillips on Moving from Relational to NoSQL Databases
James Phillips, co-founder of Couchbase, recently gave a presentation on the differences between a distributed document-oriented and relational data models and what the database developers need know to move from a relational to a NoSQL database. InfoQ caught up with James to talk about the data persistence patterns and the advantages and limitations of document-oriented NoSQL databases.
InfoQ: You talked about the "Big Data" and "Big User" problems in the data persistence and data management context. Can you explain how these concepts are different and when we should use one solution over the other?
James Phillips: Big Data: Need to collect and store very large amounts of information, then analyze the information to learn something. Requires high sustained throughput on writes (when collecting information) and reads (when analyzing it). Data locality model should optimize for keeping "related data" in physical proximity to ensure analysis is efficient. Usually will have a small number of simultaneous writers adding data to the database and a small number of simultaneous readers doing analysis.
Big User: Need to service random reads and writes, with a very large number of concurrent readers and writers. Data locality model should optimize for keeping "related data" spread out as much as possible, to spread reads and writes across the maximum number of servers and spindles.
InfoQ: What are the main differences in the areas of data modeling and application development when working on a NoSQL database compared to a Relational database?
James: The relational data model (and therefore the data modeling focus) concentrates on the data normalization process - breaking "records" into many tables with inter-table relationships, reducing duplication of data. This made the most sense in the days when the efficiency of every byte was crucial, since systems were so limited. Storing data in a document-oriented way more naturally matches the world around us. The downside is that in some cases data is duplicated and the query model arguably is made more complex. But "modeling" the data is far simpler, because the real world just isn't always normalized and app developers can be much more productive with less constraints.
InfoQ: You spoke about the document-oriented databases in your presentation. What are some use cases for using a document-oriented database over the relational one?
James: Document-oriented databases excel in use cases requiring data model flexibility (no schema to update when data management requirements change); low-latency random read and write performance with high sustained throughput; and where data and I/O can be easily spread across commodity servers or virtual machines to precisely match infrastructure costs to changing application performance requirements.
InfoQ: What are the data persistence and data management architecture patterns a Document oriented database supports?
James: Most NoSQL, and document-oriented, databases support a variety of persistence modes: from fully synchronous (e.g. only acknowledge a write has succeeded when it is stored on disk, or some other durable media) to a variety of asynchronous storage strategies (e.g. simply accept a write and indicate it succeeded before actually writing to disk, or indicate success only after ensuring copies of the data have been made).
InfoQ: What are some limitations of document-oriented databases? What should the application architects and developers consider when using such a database?
James: Document-oriented databases, today, do not have built in support for joins and transaction support is usually at the individual document level, possibly with the aforementioned durability flexibility. This leads to data duplication and the need, in some cases, to update the same information in multiple places when it changes. As a result, the application is frequently much more involved with interdependent changes on a document database. In the future, we could see the use of external transaction monitors to close the transaction gap. Likewise, there are approaches at the application level to implement something similar to enable join queries.