BT

NoSQL For Mere Mortals Review and Author Q&A

| Posted by Sergio De Simone Follow 21 Followers on Jun 16, 2015. Estimated reading time: 11 minutes |

 

Addison-Wesley Professional's NoSQL for Mere Mortals provides an introduction to NoSQL databases spanning across the major types of databases that fall under the NoSQL umbrella and explaining both advantages and shortcomings that each database type offers. InfoQ has spoken with the book’s author, Dan Sullivan.

The books starts with a general discussion of the ACID (Atomic, Consistent, Isolated, Durable) and BASE (Basically Available, Soft state, Eventually consistent) paradigm for databases. This provides an overarching analytical frame where the comparison between NoSQL and SQL databases can be situated. The four NoSQL database properties (BASE) are explained in detail and a particularly compelling explanation is provided of the different choices that different NoSQL databases have made to ensure eventual consistency.

The book then goes on to describe each of the four main types of NoSQL database: key-value databases, document databases, column family databases, and graph databases. For each type, the book provide a concise introduction of its main features and what features differentiate it from other NoSQL databases, then devote a full chapter to considerations about designing a database of that specific type. This covers material such as, e.g.:

  • how to correctly define the key and how to represent structured information for Key-Value databases;
  • how to handle the normalization/denormalization balance and most effectively use joins for Document databases;
  • how to design tables, denormalize them, handle indexing, etc. for Column family databases;
  • how to query a graph, use indexes, define graph edges etc., for Graph databases.

The final part of the book deals with choosing the right database for an application by providing a set of guidelines that can help in the decision process.

Overall, the book stands out for its clarity and highly structured approach to presenting NoSQL databases. The book furthermore provides a few case study examples of NoSQL databases applied to a business application to make the whole discussion even more concrete and hands-down. InfoQ has taken the chance to speak to Dan Sullivan, author of the book.

InfoQ: What has motivated you to write this book? What makes it unique among other NoSQL books?

Dan: This is an active time of research and experimentation in the database field. For decades, relational databases dominated the field. Advances focused on improving the performance of relational databases, which led to important developments such as columnar storage. NoSQL databases emerged because demanding use cases in companies like Google, Yahoo and Facebook could not be met with relational databases. The idea of writing a book about the recent advances in database technology was too tempting to pass up. This book is different from other NoSQL books in that it does not focus on a particular NoSQL database but describes each of the four major types: key value, document, column family and graph databases. In addition, this book delves into the technical details that someone implementing a NoSQL database will need to know. Some NoSQL books focus on explaining high level principles for business users and these are valuable resources. NoSQL for Mere Mortals takes a different approach and addresses the needs of someone who must select a NoSQL platform, design an application using a NoSQL database, or tune and maintain NoSQL databases as part of their work.

InfoQ: What should a reader expect from reading “NoSQL for Mere Mortals”?

Dan: Readers should expect a quick introduction to why NoSQL databases have emerged and how they fit into the broader database landscape. The bulk of the book is organized into four sections, one on each of the four main types of NoSQL databases. Each section introduces a NoSQL model, such as document databases, provides a detailed description of key concepts and terminologies, and concludes with design guidelines and tips. The final chapter of the book offers help on choosing the best database for your application needs.

Another point that is emphasized throughout the book is that NoSQL databases complement relational databases, they do not replace them. Relational databases have successfully served the needs of an expanding array of application requirements since the 1970s and that will not change. NoSQL databases add to our tool chest, they don’t displace other tools.

InfoQ: The relational data model has served programmers for a long time. What made programmers start looking for alternative models?

Dan: Relational databases were designed to address limitations found in earlier data management systems, such as hierarchical databases and block storage systems. The rules of normalization in relational database design help us avoid data anomalies. This is essential in many business applications, especially financial systems. As developers created new types of applications driven by the Web, such as search engines, they confronted scalability issues. While relational databases are well suited to handle thousands of enterprise users, it is more difficult to make them scale to meet the needs of hundreds of thousands or millions of Web users. While scalability was a key concern, other factors such as immediate consistency and ACID transactions were less important. This gave NoSQL designers that latitude they needed to experiment with new database models that could meet a new set of requirements based on the Web and not enterprise applications.

InfoQ: What are, on the other hand, the advantages that developers can expect from using a NoSQL database?

Dan: One of the most obvious advantages of using NoSQL databases is that developers can choose a data model that aligns with their problem domain. Someone modeling a transportation system or social network might opt for a graph database. Someone requiring high performance writes and reads based only on keys might turn to key value database. Document databases are well suited to applications requiring flexible schemas and support for semi-structured data. Column family databases are especially well suited for big data applications requiring flexible schemas.

InfoQ: What kind of drawbacks can one expect from abandoning the safe harbour of a theoretically well grounded model such as the relational model?

Dan: There are a number of drawbacks. NoSQL databases typically do not support the level of constraint enforcement found in RDBMSs. Also, attributes can be added to a document database dynamically by an application without the need to have a data modeler update a schema and deploy those changes to production. Data modelers might cringe at this. After all, if there is no master schema, who is responsible for managing it? The answer is: we all are. We need to develop new procedures for tracking changes in flexible schema and no schema data models.

InfoQ: What is the main challenge when trying to make the decision to move away from the relational model to NoSQL?

Dan: The first question to address is, do you really need to move to a NoSQL database? If you have a team that is experienced with relational databases and you have procedures in place for maintaining your relational database you should ask yourself, what are the benefits of moving to NoSQL?

  • If you find yourself working around the relational model, for example, heavily de-normalizing for query performance, then it makes sense to consider a NoSQL database.
  • If you are confronted with the prospect of having to upgrade your database server at significant cost, then it’s time to consider NoSQL databases. Most of the NoSQL databases are designed to scale horizontally on commodity hardware.
  • If you find it difficult to efficiently implement some queries in a relational database, such as recursively searching a network, then consider a NoSQL database.

InfoQ: Could you broadly sketch the main characteristics of the various flavours of NoSQL databases that the book covers?

Dan: There are four main types of NoSQL databases covered in the book: key value, document, column family and graph databases.

Key value databases employ the simplest data model. Data is stored and referenced by an index or key. This kind of database provides dictionary-like look ups but lack SQL-like query support. Key value databases, such as Redis and Riak, are good fits for applications that need high performance reads and writes of relatively atomic data structures. Riak has added search capabilities to provide more flexible retrieval options.

Document databases are probably the most popular of the NoSQL databases. A document is a collection of key value pairs. The set of key value pairs is roughly analogous to a row in a table but the keys, or attributes, can vary from one document to another. Documents also support non-atomic structures such as arrays and embedded documents. Document databases are good choices when you need a flexible schema and more advanced query support than provided in key value databases. Joins are generally avoided in document databases but if you need them, you should plan to implement them yourself in the application layer.

Column family databases, such as HBase and Cassandra, are based on Google’s BigTable. Column family databases are similar to relational databases on the surface but employ fundamentally different implementations. Column family databases use map data structures to implement sparse matrix-like structures. Column families are designed to support millions of columns and billions of rows. Needless to say, this introduces the opportunity for new design techniques not available in relational databases.

InfoQ: When it comes to the design process, how would you describe any differences between designing a RDBMS-based system and a NoSQL-based system?

Dan: One of the primary differences is the questions we ask when we design a data model. In relational modeling, we ask how are entities related and what are their attributes? When designing NoSQL models we typically start with the queries we will run against the database. This may not sound like a big difference at first but the implications are important.

In relational modeling we start by understanding, as much as we can, the relation between entities because once we have that we can answer any queries about the entities. This approach lends itself to normalizing the data which reduces the risk of data anomalies.

In NoSQL design, there are no mathematical models akin to relational algebra from which we derive design principles. Instead, we consider use cases and query patterns and design structures that balance performance with data redundancy. For example, when modeling a master-detail relation in a document database, such as MongoDB, you could embed a reference to a detail document or the details document itself. Embedding references is similar to using foreign key references in relational databases while embedding details is a de-normalizing approach. Which is better? That depends on the query access patterns. If you query the master attributes frequently but rarely need attributes of the detail record then embedding references is a better choice. In cases in which you need both master and detail attributes in frequently run queries, embedding detail documents may be a better option.

InfoQ: Could you share your view about the future of NoSQL databases?

Dan: NoSQL databases are rapidly advancing and incorporating additional features. The advances are coming on three fronts.

NoSQL databases are incorporating features of relational databases, such as support for transactions. Some NoSQL databases have at least limited support for transactions now but the trend will be to incorporate more robust transaction control. This may introduce performance hits, but application designers will have the option of using transactions when they need them and avoiding the performance cost of transactions when they don’t.

NoSQL databases are becoming multi-modal, supporting more than one type of NoSQL data model. Amazon’s DynamoDB started as a key value data store but now supports documents as well. OrientDB is a document database with support for graph databases as well. DataStax, a commercial venture supporting the Cassandra column family database, recently acquired the development team behind Titan, a highly scalable graph database. Eventually, we will have the option of storing our data in a single database while accessing the data as graphs, documents, etc. depending on which makes the most sense for the particular application.

The cloud will lower the barriers to adoption. Google has recently made BigTable, the original column family database, generally available. Amazon offers DynamoDB and Microsoft offers DocumentDB. Using a database as a service (DBaaS) platform allows developers to get started with NoSQL databases without having to take on all the database administration and tuning tasks associated with a self-managed platform.

InfoQ: Do you think there could be a trend leading to using a RDBMS to store schema less data?

Dan: Yes, I think we are already seeing that. This pattern of relational databases adopting non-relational models is not new. Object databases were something of an alternative to relational databases in the 1990s but did not catch on as much as NoSQL is today. Instead, databases like Oracle incorporated object database features into the relational model.

The limiting factor will be how well relational storage models support the performance demands of NoSQL databases. Relational databases may be used to store key value, documents or graphs right now, but is that the best option? Column family databases do not use the row or columnar storage models commonly used in relational databases. Instead, they use maps to efficiently store sparse data in tables with potentially millions of attributes and billions of rows.

There are some use cases where storing NoSQL data structures in relational database will make sense, especially when dealing with relatively small volumes of data. Once you start pushing the limits of your relational database server and need to scale up, it is time to consider using a database with a storage model that is a more natural fit with your data model.

About the Book Author

Dan Sullivan is a data architect and data scientist with more than 20 years of experience in business intelligence, machine learning, data mining, text mining, Big Data, data modeling, and application design. His most recent work has focused on NoSQL database modeling, data analysis, cloud computing, text mining and data integration in life sciences. Dan has extensive experience in relational database design and works regularly with NoSQL databases.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Erata by Shashikanth Channagiri

Page 7
At the time flat files were commonly used
I think author meant disk here

Erata by Shashikanth Channagiri

Page 61.
name of the item.
This is confusing, is it name of the item or name of customer or did the author meant number of items
Listing 2.11 doesn't list address

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss
BT