BT

Membase, a new and heavyweight NoSQL family member

by Michael Hunger on Jun 23, 2010 |

On March, 23 North Scale Solutions publicly announced the availability of the Membase NoSQL database solution. This release sparked a lot of coverage (MarketWire, TheRegister, GigaOM).

It was developed by members of the memcached core team at North Scale, together with engineers from the two major contributors Zynga and NHN, both big players in the online game and social network space.
Other early adopters are mig33 (mobile applications) and Red Aril (advertisment).

Membase LogoThe open source project at Membase.org makes the sourcecode available under the Apache 2.0 license. It is hosted at GitHub. Source tarballs and Linux binaries are available as public beta for download.

 

Commercial support is provided by North Scale with their dedicated server software that complements the existing support for memcached server.

Besides the press releases there are not many technical facts about the database available. The best insights may be gained by looking at the source code.

The main, hard objectives in developing Membase were: "Simple, Fast, Elastic".

The simplicity is provided by the key value store. There is no additional query capability (yet). Extensions are possible through a plug-in architecture (hooks through filtered TAP interface) which can be used for full-text search, backup or data warehouse dumps. Some other (planned) extensions points are the Data bucket – engine API for specialized container types and the future "NodeCode".

Ease of installation, operation and extension from single nodes to clusters as well as the drop-in replacement for memcached (wire-protocol compatibility) offer a low treshold for developer and operations buy in. Memcached is already widely used as caching solution in many different types of applications (especially high-throughput webapps). Memcached's codebase is partly directly used in the front end to the Membase server.

Through this compatibility bindings for many programming languages and frameworks can be reused for Membase. For managing Membase installations, graphical and programmatic interfaces and a configurable alerting are available.

Membase is designed to scale-out linearly, it contains uniform nodes that can be duplicated for increasing cluster capacity. It is still necessary to initiate a redistribution of the stored data.

One interesting attribute of this NoSQL solution is the promised predictable performance and quasi-deterministic latency and troughput. This should be achieved by:

  • Auto-migration of hot data to lowest latency storage technology (RAM, SSD, Disk)
  • Selectable write behavior – asynchronous, synchronous (on replication, persistence)
  • Back-channel rebalancing [FUTURE]
  • Multi-threaded with low lock contention
  • Asynchronous handling wherever possible
  • Automatic write de-duplication
  • Dynamic rebalancing of a live cluster
  • Providing high availability by copying data to multiple cluster members and supporting rapid fail-over

Two of the more technical slides of the North Scale presentation:
 

Membase Key Translation

Membase Replication

Alex Popescu pointed out the lack of technical information and referred to Gear6 Memcached solutions which was recently acquired by Violin Memory a company providing large amounts of server side Flash Memory infrastructure.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

The idea of dynamically distributed key-value store by Andrei Sedoi

I just wonder how can a key-value store be dynamically distributed based on a key. I mean the scenario when we add a new machine. In other words how is address of the machine "calculated" based on a key? It's obvious that key space cannot be redistributed in real time.

vBuckets are the key in membase by James Phillips

Hi Andrei. Below is a cut and paste from one of the membase functional specs. We'll be hanging those out on the project early next week. You are correct in pointing out that memcached uses a hashing function to directly map a key to a server in the list (this list can vary in size). The vBucket structure is one of the key mechanics allowing our ability to scale a cluster up or down elastically. If you send me your email (I'm james at northscale dot com) address, I'll send you a few specs.

Same for you Michael - thanks for the mention!

james.

vBuckets defined

A vBucket is defined as the “owner” of a subset of the key space of a membase cluster.

Every key “belongs” in a vBucket. A mapping function is used to calculate the vbucket in which a given key belongs. In membase 1.6, that mapping function is a hashing function that takes a key as input and outputs a vBucket number. The vBucket number is used as an index into a table (the “vBucket Map”) which is consulted to determine which server is acting as Master Server for that vBucket. The table contains one row per vBucket, pairing the vBucket with its assigned Master Server. A server appearing in this table can be (and usually is) responsible for multiple vBuckets.

The hashing function used by membase to map keys to vBuckets is configurable – both the hashing algorithm and the output space (i.e. the total number of vBuckets output by the function). Naturally, if the number of vBuckets in the output space of the hash function is changed, then the table which maps vBuckets to Servers must be resized.

Re: The idea of dynamically distributed key-value store by Cameron Purdy

I just wonder how can a key-value store be dynamically distributed based on a key. I mean the scenario when we add a new machine. In other words how is address of the machine "calculated" based on a key? It's obvious that key space cannot be redistributed in real time.


This is exactly what Oracle Coherence does, and has been doing since early 2002 :-)

Generally, you want to avoid mapping keys directly to machines, as that causes your memory overhead to increase in linear relationship to the number of keys. Instead, keys are mapped to a relatively small number of partitions, and partitions are mapped to machines. Failover, backup, and life-cycle are then managed at a partition level.

Peace,

Cameron Purdy | Oracle Coherence
coherence.oracle.com/

Re: vBuckets are the key in membase by peter lin

It's good to see open source evolve and learn from commercial products. Does Northscale plan to copy all/some the features found in Coherence, Extreme scale and gigaspaces? Features like write behind, failover, etc?

Clustering over WAN by Atif Khan

Does the clustering and replication work over WAN as well? Is Membase being built to support cloud deployment with global footprint?

Re: vBuckets are the key in membase by Atif Khan

Peter,
I think there is a slight difference between the products like Coherence, Extreme Scale, Gigaspaces and something like Membase. The products you mentioned are targeted towards providing a distributed grid to for data/object storage and mainly targeted toward in-memory storage. Some of these do enable a persistence store as well.
Membase on the other hand is a persistence engine that implements some of the concepts of in-memory grid.

Re: vBuckets are the key in membase by James Phillips

Hi Peter -
Membase has both write behind and failover...but so have most data management systems built over the last 40 or so years. ;P While the initial release of membase is obviously playing table stakes catch-up with existing systems, our plans certainly include some substantial innovation beyond the obvious. Happy to share more if you are interested.
james.

Re: The idea of dynamically distributed key-value store by James Phillips

Nice to see you weigh in Cameron. Love coherence and everything you've done there. Certainly there are similarities between systems...but there will be divergence over time I believe.

Re: Clustering over WAN by James Phillips

A large social gaming company is using membase on ec2 across 2 data centers. They are using master-slave replication and primarily (currently) for disaster recovery. There is much work planned in this area though. Current capabilities are basic.

James Phillips
www.northscale.com

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

9 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT