Peter Bell shares insights on the latest trends in NoSQL, a rapidly evolving category of database storage that covers a wide variety of solutions. Peter is a trainer with Pragmatic Learning, a company he founded, a contract member of the GitHub training team and founder of Speak Geek, a company that trains business people to hire and manage developers. He trains and presents regularly on a range of NoSQL data stores including MongoDB, Neo4j and redis and is a MongoDB master.
InfoQ: The past few years have rapidly taken NoSQL from a relatively fringe solution suitable for only those comfortable living on the cutting edge to being a standard part of many application technology stacks. What do you think is driving this rapid adoption?
There are three main drivers of NoSQL adoption. The first is demand. Given the rise in internet and mobile traffic over the last few years there are now large numbers of companies dealing with scales that a few years ago were almost unheard of. Traditional relational databases were never designed for easy cross-node scaling so NoSQL stores are popular amongst companies that need to be able to scale quickly, easily and cost effectively.
The second driver is availability. Open source software has really matured over the last few years and there are a number of mature open source NoSQL stores available now, allowing companies to easily pick a data store that meets their needs.
Finally, NoSQL is trendy! I think there are absolutely applications being built today with NoSQL stores where a relational database would be more appropriate. However, as NoSQL databases move from novel to mainstream and dull, hopefully technologists will do a better job of choosing the appropriate solutions for their use cases.
InfoQ: Recently, we've seen about a dozen relatively new vendors enter the arena that some are referring to as NewSQL (though the name seems to not yet be widely used by these vendors). Can you explain what NewSQL is? Do you think it will increase the acceptability of non-relational databases among enterprise clients who require guarantees of transactional databases?
NewSQL is used to refer to modern databases like NuoDB that combine easy, cross-node scalability with support for SQL queries. For example, if you have sufficient load that a single server isn’t enough for you but you don’t want to have to retrain your development team to program against databases that don’t implement SQL, NewSQL databases could be well worth a look. I think we’re just at the start of the NewSQL revolution, but there is no question that while for some use cases NoSQL stores provide better abstractions than relational algebra, for others, a scalable database with a familiar programming model will be a better solution.
InfoQ: Graph databases, like Neo4J, are based on graph theory to model relationships between nodes. This sounds complicated but can you explain what benefits this offers? Do you think this category has the opportunity to gain more mainstream acceptance?
The world is a graph. Whether I’m trying to manage file permissions for a user who is a member of multiple groups with different rules, figure out which of my friends can recommend a restaurant in Delhi or calculate the best way to route a package from PVG to LHR, graphs are often a really natural way to model our domains. From ecommerce to content management and from bio-informatics to recommendations, graphs can allow us to get more value from the data we already have. Imagine trying to calculate the “six degrees of Kevin Bacon” for any actor using a SQL query. In cypher (a declarative, SQL like graph query language provided by Neo4j) it’s trivial.
Facebook launched their graph search. Google uses a knowledge graph to improve its search results. I think graph databases are going to be one of the most interesting sources of innovation - both for startups and enterprises over the next few years. When I first started working with graph databases, I thought that they would only be useful in niches like social applications (Glassdoor uses neo4j) and finding the cheapest or quickest routes for people or packages. The more I work with neo4j the more I realize that there are few domains that couldn’t benefit from a graph based model. I certainly don’t think graphs will become the primary modeling paradigm for storage solutions, but I think they can be useful in a pretty wide range of domains.
InfoQ: Redis describes itself as an "advanced key-value" store, but I've heard you say that it is more than just a key-value store. Can you explain what you mean?
While redis is a key:value store (as opposed to a document, graph or columnar data store), it is so much more than just keys and values. Redis provides lists, sets, sorted sets and pub:sub functionality that allow you to solve a really wide range of interesting problems very performantly. It’s also an in memory solution (with the attendant performance) that has the ability to snapshot or log to disk. Redis isn’t a general purpose “go to” data store, but if you make decisions on persistence solutions, it’s well worth learning about as it has some terrific capabilities.
InfoQ: Are there any other trends you see that you believe readers should know about?
The next big trend to look out for after NoSQL and around the same timeframe as NewSQL is immutable data stores. There has been a lot of discussion over the last few years about the value of functional programming for scaling processing effectively across multiple servers. By minimizing shared mutable state, a functional programming model avoids the deadlock issues that plague OO programming scaled across a large number of computers.
But if we agree that shared mutable state is a problem for scaling, why do we allow our databases to be mutable? If you think about it, a database is just a large, shared mutable store (a little like a collection of global variables being shared between all of your servers). A number of companies (including Twitter) are now looking into the properties of immutable data stores - databases that can accept new facts but where existing data cannot generally be modified or deleted. A good starting point for examining this trend is Datomic - the data store built by Rich Hickey - the author of Clojure.