Interview and Book Review: NoSQL Distilled
NoSQL Distilled, authored by Pramod Sadalage and Martin Fowler, covers NoSQL databases as well as the concept of Polyglot Persistence that is becoming popular with the emergence of various NoSQL data stores. They cover the data models supported by NoSQL data stores like Aggregate Data Models including Key Value, Document, and Column-Family Stores. They also discuss the Relationships concept and Graph Databases that support the relationships between data entities.
The authors follow a consistent format for the chapters on the popular NoSQL data stores. Each chapter includes the sections on Features, Suitable Use Cases, and When Not to Use for each of the NoSQL databases like Key-Value, Document, Column-Family, and Graph Databases.
They also discuss how to manage schema changes in a relational databases (RDBMS) versus a NoSQL Data Store. They talk about the polyglot persistence concept which addresses the disparate data storage needs, and discuss service usage over direct data store usage.
In the last chapter, authors discuss what to consider when choosing a database for your data storage requirements.
InfoQ spoke with Pramod and Martin about their book, the NoSQL database space, and the emerging trends in NoSQL.
InfoQ: What are the different design considerations (like consistency, availability and concurrency control) that database architects and developers should take into account when using NoSQL databases compared to the traditional relational databases?
Pramod: Much of this depends on the requirements of the application being developed. Some applications may need high availability but may not have such high requirements for consistent data – systems such as log aggregation or content display systems may fit this model. Other systems may need highly consistent data but not high availability. So developers get to make the decisions about options to choose in the CAP theorem. In traditional relational databases some of these decisions have been made for you by the database itself.
Martin: Also remember that within a single application, different use cases may have different needs for availability, response time, and consistency. To make the right decisions you have to understand the underlying business needs.
InfoQ: NoSQL databases are schemaless. How does this feature of NoSQL data stores impact the data governance and database management efforts of data architects and DBAs supporting enterprise applications?
Pramod: Data storage is schemaless, but it does not mean that there is no schema or there is no schema design. Schema is defined by the application that writes to the database. Data governance and data architecture can be applied and influenced at the application level. This shift means that the DBA and Data Architecture teams need to understand some of these NoSQL technologies and service-based integration instead of always relying on database-based integration.
Martin: We've heard people argue that a schemaless database means you don't have to worry about database migration. This is a serious misapprehension. Certainly being schemaless gives you a few more options, but often you have to do similar techniques for migration that you use for relational databases.
InfoQ: Can you talk more about the techniques like sharding and replication that are getting more popular with the emergence of NoSQL databases? What are the advantages and limitations of these techniques?
Martin: One of the original drivers for NoSQL databases was to work better on clusters, with large amounts of replication and/or sharding. This is particularly a feature of those databases that we classify as aggregate-oriented, because the aggregates make a natural unit for distributing the data.
Pramod: Replication has always been available even in traditional relational databases. Sharding is the ability of the database to move data between different nodes based on some key, known as the shard-key. Sharding allows for horizontal scaling and is a very powerful technique to scale an application. If there is a need to aggregate data across all the shards then it becomes difficult as all the data is not on a single node. Databases such as Riak and Cassandra have ring configurations where the data is partitioned across all the nodes of the cluster, and these databases have algorithms to partition the data. Sharding can also be thought about in similar terms as partitioning in relational databases with the added benefit of the partitions being on different nodes instead of on the same node as in relational databases.
InfoQ: This question would have sounded kind of strange few years ago but with the explosive growth of NoSQL databases, I think it’s relevant to ask now. What is the future of relational databases and what will be their role in the emerging NoSQL and BigData landscape?
Pramod: The emergence of NoSQL databases has given a choice. This choice is really helpful in designing systems that meet the specific needs of the application. We think this choice‚ which is called Polyglot Persistence, is really a blessing and the IT industry should embrace these technologies and understand how to use them. The traditional database professional should be open to learning some of these technologies and be able to choose the right database technology for the application or enterprise requirements. Relational and non-relational databases will co-exist in this multi-choice technology landscape.
Martin: We think that relational databases will still be used in the majority of cases, at least for the next several years. After all, the products are mature, there are rich supporting tools, and they are relatively well-understood. But we've been using NoSQL databases on selected projects for a couple of years now at ThoughtWorks, and have been happy with many of them - which is why we're confident that many of them are ready for adoption in the enterprise.
InfoQ: In-memory Data Grids (IMDG) are also gaining more adoption recently. Can you talk about this new type of NoSQL data stores?
Pramod: In-memory data grids are gaining traction for their ability to have RAM based data storage accessible by a cluster of machines. Coherence and Gigaspaces are some of the products in this space.
InfoQ: You covered the topic of Polyglot Persistence in the book. Can you talk about how this new persistence approach influences the other layers of application architecture, especially the data access, domain and service layers? Are there any design practices or gotchas that we need to consider for polyglot persistence?
Martin: The most important consequence is the shift from treating databases as integration points to realizing that your applications need to encapsulate their data storage and communicate through higher-level services.
Other parts of the picture are murkier. One big issue is to what extent can we encapsulate the data storage from other parts of the same application. These databases have different data models, and part of the reason for using them is that different data models map more cleanly to a suitable application. So this raises questions about how much you want to encapsulate that…and it's too early to really see the answers there.
InfoQ: Sentiment analysis is another trend that's getting attention with the wide usage of NoSQL databases. Can you talk about this and data analytics in general in the context of NoSQL databases?
Pramod: Analyzing large amounts of data was made possible by NoSQL databases and data processing frameworks like Hadoop, Pig, Hive etc. The ability to write large amounts of data and then be able to query them back, allows people to analyze large amounts of data for things like trend, sentiment or flight status analysis such as provided by Flightcaster.
InfoQ: You dedicated the last chapter in the book on the topic of choosing a database where you discuss considerations like developer productivity and data access performance. Can you talk about this and tell our readers how they can get the best of the NoSQL database world?
Martin: What this boils down to is that we see two main reasons why people are interested in adopting NoSQL databases:
- Accessing large amounts of data rapidly at a decent cost leads many people to look at running on large clusters, which was one of the main reasons that originally drove the interest in NoSQL.
- For many situations, the relational data model isn't a terribly good fit, and you can get better productivity by choosing a NoSQL database whose data model is a better fit. Highly connected data leads you to a graph database, aggregate structures lead you to an aggregate-oriented approach.
But whatever you do, the key thing is to try a database out. Only by using a database and prototyping through some key scenarios can you really judge its suitability. This is where the open-source nature of most of these databases is a significant advantage.
InfoQ: What are the emerging trends in NoSQL database space?
Martin: As I see it, it's all currently about adding the tools and maturity that make it easier to use these databases well. Much of this is comes from experience of early adopters like us, and it's good to watch the evolution happen.
Pramod: More adoption of these NoSQL technologies means there is a lot more work being done on tools, drivers, monitoring capabilities and many other features. We also see new technologies such as Datomic, which provides database as a service in the cloud, or VoltDB, which tries to use the SQL paradigm but scales the solutions, giving rise to the term "NewSQL". It’s a great time for data.
About the Authors
Pramod J. Sadalage, Principal Consultant at ThoughtWorks, enjoys the rare role of bridging the divide between database professionals and application developers. He regularly consults with clients who have particularly challenging data needs requiring new technologies and techniques. He developed pioneering techniques that allowed relational databases to be designed in an evolutionary manner based on version-controlled schema migrations. With Scott Ambler, he coauthored Refactoring Databases (Addison-Wesley, 2006).
Martin Fowler, Chief Scientist at ThoughtWorks, focuses on better ways to design software systems and improve developer productivity. His books include Patterns of Enterprise Application Architecture; UML Distilled, Third Edition; Domain-Specific Languages (with Rebecca Parsons); and Refactoring: Improving the Design of Existing Code (with Kent Beck, John Brant, and William Opdyke). All are published by Addison-Wesley.
NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence”, authored by Pramod Sadalage and Martin Fowler, is published by Pearson/Addison-Wesley Professional, Aug. 2012, ISBN 0321826620, Copyright 2013 Pearson Education, Inc. For more info please visit the publisher website.