CouchDB versus Couchbase: What are the differences, and what happened to Membase?
In February 2011, CouchOne and Membase merged. The combined company is called Couchbase. Membase had a product called Membase which was a key/value, persistent, scalable solution that used the memcached wire protocol. CouchOne supported CouchDB. CouchDB is a document database which has a peer to peer replication approach, which is really good for mobile and geographically separated data centers. Couchbase created a new product combining parts of Membase and parts of CouchDB, and the new product is called Couchbase.
Recently Couchbase published a comparison of Couchbase and CouchDB to denote the differences and simlarities between the two. This document addresses a common question: "What is the difference between CouchDB and Couchbase?"
The reality is that Couchbase and CouchDB are closely related. The Couchbase product contains a copy of CouchDB. Couchbase product adds to CouchDB caching, clustering and more. InfoQ caught up with one of the founders of Couchbase, James Phillips to discuss the comparison and the merger of the two products Membase and CouchDB.
InfoQ: Membase seems to be a very solid brand, why did you change the product name to Couchbase?
In early 2011, the company that was Membase merged with a company called CouchOne. The combined entity took a portion of each companies name – giving us Couchbase. Ultimately the name change better reflects the technology we offer – Couchbase is a document-oriented database (with technology inherited from the Apache CouchDB project) that can scale horizontally and which provides very low-latency access to data for both reads and writes (due to Membase technology).
InfoQ: Before selecting CouchDB as the persistence and query engine, what did Membase use?
SQLite was the embedded storage engine used in Membase that was replaced by Apache CouchDB technology in Couchbase Server.
InfoQ: How important was the memcached wire protocol to Membase adoption?
Memcached compatibility has been very important to the adoption of Membase and now Couchbase Server (which supports the same wire protocol). Every language and application development framework natively supports memcached, and most developers have used memcached previously, so it is easy to pick up and begin using.
InfoQ: Membase seemed like a very useful solution as it was, and certainly had some really big name customers and case studies like Zynga. What did you get by using CouchDB as the persistence/query layer that Membase customers were clamoring for?
Couchbase is typically used as the system of record for interactive software systems – replacing the role previously played by relational database technology like MySQL or Oracle. The key-value operations that were supported by Membase certainly allowed useful systems to be built, but a simple key-value store can’t answer even simple questions such as “which users currently have a sheep on their farm?” In order to answer that question on a pure key-value store, the application must read the entire database, key by key, then “look inside” the value part of the key-value pair to see if there is a sheep inside. By embedding CouchDB, the database can now do that work on behalf of the application and without the need for a full database scan (because CouchDB can maintain an index that speeds that kind of query).
InfoQ: Who is your closest competitor in the NoSQL, distributed data space?
InfoQ: CouchBase and MongoDB are document oriented and are quite successful? What advantages does a document oriented approach have over column oriented approach like Cassandra (BigTable/Dynamo hybrid)?
With a document oriented database an application can insert records (“documents”) without regard to their structure, as long as they adhere to some standard formatting rules (e.g. XML, JSON). Queries can then be executed regardless of whether certain columns have been defined, or column families or super columns or any of the other structures that a column-oriented database requires one to maintain. The document-oriented model provides a more flexible, general-purpose approach to transactional data management without limiting the kinds of queries that can be run.
InfoQ: The Couchdb/Couchbase comparison mentions couchbase adds autosharding capabilities to CouchDB. Does Couchbase add any additional support for replication for high availability above and beyond what core CouchDB offers?
Couchbase Server actually includes two kinds of “replication” technology: For intra-datacenter deployments (a cluster), Membase-style replication (which favors immediate consistency in the face of a network partition) is used as it provides the most natural development model and the likelihood of a split-brain network partition can be engineered to be statistically less probable than an asteroid collision with the data center. For inter-datacenter deployments (where clusters are geographically distributed) the likelihood of a split-brain network partition is very high, since application servers AND database servers live on both sides of a (relatively) fragile WAN connection. CouchDB-style replication is used in cross-datacenter deployments as it supports conflict detection and resolution which is more likely in this scenario.
InfoQ: Couchbase like Membase before it is a drop in replacement for Memcached so application using Memcached can use Couchbase right out of the gate, but how do client drivers not written with auto-sharding in mind utilize Couchbase's autosharding features?
There is a proxy-layer (called moxi), built-in to Couchbase Server, or deployable on the application server, that bridges the gap between the consistent hashing algorithm approach used by “off the shelf” memcached clients and the 2 level indirection employed by Couchbase Server (hashing to find virtual server, then lookup to map virtual to real server).
InfoQ: How does Couchbase address applications that need reliable persistence? Is their a journaling option? Is their an option where the data has to be replicated to more nodes? How do you balance write speed with reliable persistence? Do you need to have at least two servers for some guarantee of durability?
Couchbase can be configured (on a per operation basis) to acknowledge writes immediately (with writing done asynchronously) or only after data has been replicated or written to durable media. Users get to make their own durability and performance trade-offs.
Background on Membase, Couchbase and Northscale
Membase (the product) was announced October 2010, and was developed by Zynga, and NorthScale, and NHN. NorthScale became Membase Inc., which then became Couchbase Inc. after merging with CouchOne Inc in 2011. Membase is used by Zynga for its popular social games, namely, Farmville, Mafia Wars, and Cafe World. Membase was optimized for storing web applications data like Farmville's data. These online social games store a lot of data. "It's a mind-boggling amount of data. It's a new sort of data, and it warranted development of a new sort of database management system (Membase)" according to Audrey Watters of ReadWrite Cloud. Zynga was already using Memcached so the transition to Membase was a natural one. There was an InfoQ interview with Dustin Sallings, a Couchbase engineer, who discused changes to Memcached wire protocol to support Membase like products.
In a related InfoQ story Damien Katz, creator of CouchDB and another co-founder of Couchbase, announced he was going to focus on Couchbase as this was an opportunity to start-over with CouchDB and throw out what did not work, strengthen what was working, and include the scalability, speed, clustering, and caching features of Membase in the combined Couchbase product. Damien lamented on the speed and progress of a consensus based Apache project, and the need for a successful comercial product to move quickly. His take on merging the products was to create a combined product that played on both of their strengths. In a follow up blog post, Damien went on to say Membase product is very fast and scalable, but has no reporting capability or cross-datacenter replication capability. CouchDB product has more features like advanced replication and reporting, but is not fast, and can’t keep up with high loads. The combination of the two should be a very successful combination, and Couchbase was born.