Transitioning from RDBMS to NoSQL. Interview with Couchbase’s Dipti Borkar
While relational databases have been used for decades to store data, and they still represent a viable solution for many use cases, NoSQL is being chosen today especially for scalability and performance reasons. This article contains an interview with Dipti Borkar, Director of Product Management at Couchbase, on the challenges, benefits and the process of migrating from RDBMS to NoSQL.
InfoQ: When is it the time to dump SQL for a NoSQL solution?
Dipti Borkar: OK, that title sounds a little harsh – and in truth, in most cases, it's not a matter of dumping SQL for a NoSQL solution, but rather, it’s about making a transition from one to the other, where application and use case dictate the need for a change. In general, this transition will be spurred by the need for flexibility – both in the scaling model and the data model - when building modern web and mobile applications.
Typical web applications are built with a three-tier architecture. To scale out the application, more commodity web servers are simply added behind a load balancer to support more users. The ability to scale out is a core tenet of the increasingly important cloud-computing model, in which virtual machine instances can be easily added or removed to match demand.
However, when it comes to the data layer, relational database (RDBMS) technology does not scale out and does not provide a flexible data model, presenting a number of challenges. Handling more users means adding a bigger server, and big servers are highly complex, proprietary, and disproportionately expensive, unlike the low-cost, commodity hardware in web- and cloud-based architectures. So, as organizations start seeing performance problems with their relational database for existing or new applications, particularly as the number of users grows, they realize the need for a faster, elastic database tier. This is the time to start evaluating NoSQL and adopting it as the database tier in their interactive web applications.
InfoQ: What would be the main steps required to transition from SQL to NoSQL?
Dipti Borkar: Organizations/projects can vary greatly in terms of what they are looking for in a NoSQL database; so much of the transition will depend on your use-case. Below are general guidelines for transitioning:
#1 Understand the key requirements for your application:
Some of the requirements that match the need for NoSQL are
- Rapid application development
– Changing market needs
– Changing data needs
– Unknown user demand
– Need for constantly growing throughput to access, add and update data
- Consistent performance
– Low response time for better user experience
– High throughput to handle viral growth
- Operational reliability
– High-availability to handle failures gracefully with minimal impact to the application
– Built-in monitoring APIs for easy ongoing maintenance
#2 Understand the various types of NoSQL offerings:
There is a common myth that all NoSQL databases are created equally - this is not true. Cassandra, for instance may be a solution you use for analytical use cases given its columnar data model. Neo4j, a graph database, for example, may be the database you use for applications that need access to relationships between entities.
I'll focus specifically on distributed document-oriented NoSQL database technology – with Couchbase and MongoDB being the two most visible and widely adopted examples.
#3 Execute a proof of concept
Once you have narrowed down on potential choices for the database tier, plan a proof of concept integrating the key characteristics of your application. Look for response time and throughput performance and the ability to scale out easily.
#4 Document modeling and development
For document databases, spend sometime on modeling your data from fixed tabular schemas to flexible document objects.
#5 Deploying to staging and production
Operational stability is a very important aspect for interactive web applications. Test and stage your application rollout as you would for applications that use traditional RDBMS systems. Ensure your selected database supports monitoring across the cluster, easy online scaling for adding capacity if needed and other database administrative tools.
#6 Stay up to date on newest trends
There is a plethora of quality, free training courses throughout the US that offer hands-on NoSQL training courses. The best way to ensure a successful NoSQL implementation is to have an educated developer team that is up to date on the latest server releases and vendor offerings.
Below are links to some of the biggest ones:
InfoQ: What are the main difficulties migrating from SQL to NoSQL?
Dipti Borkar: The main difficulty basically boils down to understanding the differences between the traditional RDBMS systems and document databases. The most important difference is the data model:
As shown above, each record in a relational database conforms to a schema – with a fixed number of fields (columns) each having a specified purpose and data type. Every record is the same. Data is
denormalized across multiple tables. The upside is that there is less duplicated data in the database. The downside is that a change in the schema means performing several expensive “alter table” statements that requires locking down many tables simultaneously to ensure a change doesn’t leave the database in an inconsistent state.
With document databases on the other hand, each document can have a completely different structure from other documents. No additional management is required on the database to handle changes to document schemas.
InfoQ: What are the benefits of NoSQL document databases?
Dipti Borkar: The main benefits of document databases are:
- Flexible data model
Data can be inserted without a defined schema, and the format of the data being inserted can change at any time—providing extreme flexibility in the application, which ultimately delivers substantial business agility.
- Easy scalability
Some NoSQL databases automatically spreads data across servers, requiring no participation from the applications. Servers can be added and removed from the data layer without application downtime, with data and I/O spread across servers.
- Consistent, high performance
Advanced NoSQL database technologies transparently cache data in system memory—a behavior that is completely transparent to the developer and the operations team.
InfoQ: How do developers react when you tell them about adopting NoSQL?
Dipti Borkar: Developers are extremely excited about NoSQL technologies particularly because of the ease of development some databases bring. Document databases have extremely flexible schemas and are easy to work with.
Developers can iterate over application changes faster without the need to change the schema of the underlying database. This is particularly useful when developers are building applications with sparse data or data that’s constantly changing or data from third-party providers they do not have control over.
InfoQ: Is it OK to work with existing developers and have them learn new skills or should you look for new ones that master NoSQL?
Dipti Borkar: Application developers will find it easy to adopt some NoSQL technologies, particularly those that support JSON as the document format. More and more developers are using JSON to model objects in their applications. Therefore storing the data directly as JSON in the database reduces the impedance mismatch across the stack.
Developers who heavily use SQL may need to adapt and learn about document modeling approaches. Rethinking how data can be structured in a logical way using documents rather than normalizing the data into a fixed database schema becomes an important aspect.
InfoQ: Have you had or heard of unsuccessful attempts to switch to NoSQL? If yes, what went wrong?
Dipti Borkar: Architects and developers should ensure that their key requirements are satisfied by the solution or database selected. For example, choosing a database that’s more suited towards analytical applications may not satisfy your latency and throughput needs for interactive applications. Projects that make a quick choice without investigating all requirements may find that they have slower response times for data access leading to a poor user experience. Users need to plan up front for scalability. Here’s a more drastic example of things going south. In some situations an app has gone viral but the database that was selected couldn’t keep up and scale out.
At the same time, using a database that is more suited towards an OLTP-like use case may not perform well for advanced analysis jobs or complex processing. A big data solution may be more suitable.
InfoQ: What are the key lessons migrating to NoSQL?
Dipti Borkar: There are a lot of benefits developers will see when moving to NoSQL. A more flexible data model and freedom from rigid schemas is a big one. You may also see significantly improved performance and the ability to horizontally scale out the data layer.
But most NoSQL products are in early stages of the product cycle. While functionality like complex joins or multi-document transactions can be simulated in the app, developers may be better off using a traditional RDBMS. And for some projects, a hybrid approach might be the best choice.
About the Interviewee
Dipti Borkar is the Director of Product Management at Couchbase where she is responsible for the product roadmap of Couchbase Server, a NoSQL database and works with customers and users to understand emerging requirements for low-latency, scalable data stores. Dipti has deep technical experience in the database industry having worked at IBM as a software engineer and Development Manager for the DB2 server team and then at MarkLogic as a Senior Product Manager. Dipti holds a Masters degree in Computer Science from the University of California, San Diego with a specialization in databases and holds an MBA from the Haas School of Business at University of California, Berkeley.
"...relational database (RDBMS) technology does not scale out and does not provide a flexible data model..."
Consider Teradata - at eBay for example, they have Petabytes of data on Teradata, which is more than may of the readers will ecer have, and it still has the capability to scale out much further. Other solutions would be IBM PureScale and VoltDB for example.
As for "...flexible data model..." in the NoSQL world the pattern is to build a table for every query result set. I would hardly call this flexible. And the "sparsity" argument holds no water since you can build exactly the same model in an RDBMS either as rows with Key, Name, Value and Timestamp or with Key and JSON (or XML).
As for cost - let's look at VoltDB which has an open source model and runs on commodity hardware.
And once you get into the analytic world, the power of columnar databases will significantly reduce the number of servers you will need.
In general you over generalize the RDBMS - I suspect thinking only Oracle, MySQL and the like?
"Cassandra, for instance may be a solution you use for analytical use cases given its columnar data model. Neo4j, a graph database, for example, may be the database you use for applications that need access to relationships between entities."
Cassandra is a Row data model, not a columnar data model - a Column Family is a Row. Vertica, Vectorwise, MonetDB and Paracel are columnar models.
Neo4j ...access to relationships? Neo4j is about graph navigation. Relationships are stored well in relational databases, Cassandra and others.
Perhaps you are being a little loose with your terminology?
...migrating SQL to NoSQL
"...relational database...Data is denormalized across multiple tables."
Nothing can be further from the truth with correct data modeling. The whole idea of the relational model is the data IS normalized!
And as for document databases - you must remember that "someone" needs to know what is in the document, so if the database does not then it needs to be documented somewhere. I hope you do not suggest "read the code"? and also it is now up to the code (of a variety of coders) to ensure the validity of the data - for example is the date a real date? The various applications must handle instead of the database (NoSQL databases can do this validation of course - the point is that the database is typically the place you should validate.)
Re: ...migrating SQL to NoSQL
there was a typo in the text. The text now reads: "Data is normalized...". Thank you for pointing this out.
Re: ...RDBMS technology...
Shane Hastie on Distributed Agile Teams, Product Ownership and the Agile Manifesto Translation Program
Shane Hastie Apr 17, 2015