InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Membase, a new and heavyweight NoSQL family member

Posted by Michael Hunger on Jun 23, 2010

Sections
Architecture & Design,
Development,
Operations & Infrastructure
Topics
NoSQL ,
Performance & Scalability ,
Clustering & Caching ,
Persistence ,
Architecture

On March, 23 North Scale Solutions publicly announced the availability of the Membase NoSQL database solution. This release sparked a lot of coverage (MarketWire, TheRegister, GigaOM).

It was developed by members of the memcached core team at North Scale, together with engineers from the two major contributors Zynga and NHN, both big players in the online game and social network space.
Other early adopters are mig33 (mobile applications) and Red Aril (advertisment).

Membase LogoThe open source project at Membase.org makes the sourcecode available under the Apache 2.0 license. It is hosted at GitHub. Source tarballs and Linux binaries are available as public beta for download.

 

Commercial support is provided by North Scale with their dedicated server software that complements the existing support for memcached server.

Besides the press releases there are not many technical facts about the database available. The best insights may be gained by looking at the source code.

The main, hard objectives in developing Membase were: "Simple, Fast, Elastic".

The simplicity is provided by the key value store. There is no additional query capability (yet). Extensions are possible through a plug-in architecture (hooks through filtered TAP interface) which can be used for full-text search, backup or data warehouse dumps. Some other (planned) extensions points are the Data bucket – engine API for specialized container types and the future "NodeCode".

Ease of installation, operation and extension from single nodes to clusters as well as the drop-in replacement for memcached (wire-protocol compatibility) offer a low treshold for developer and operations buy in. Memcached is already widely used as caching solution in many different types of applications (especially high-throughput webapps). Memcached's codebase is partly directly used in the front end to the Membase server.

Through this compatibility bindings for many programming languages and frameworks can be reused for Membase. For managing Membase installations, graphical and programmatic interfaces and a configurable alerting are available.

Membase is designed to scale-out linearly, it contains uniform nodes that can be duplicated for increasing cluster capacity. It is still necessary to initiate a redistribution of the stored data.

One interesting attribute of this NoSQL solution is the promised predictable performance and quasi-deterministic latency and troughput. This should be achieved by:

  • Auto-migration of hot data to lowest latency storage technology (RAM, SSD, Disk)
  • Selectable write behavior – asynchronous, synchronous (on replication, persistence)
  • Back-channel rebalancing [FUTURE]
  • Multi-threaded with low lock contention
  • Asynchronous handling wherever possible
  • Automatic write de-duplication
  • Dynamic rebalancing of a live cluster
  • Providing high availability by copying data to multiple cluster members and supporting rapid fail-over

Two of the more technical slides of the North Scale presentation:
 

Membase Key Translation

Membase Replication

Alex Popescu pointed out the lack of technical information and referred to Gear6 Memcached solutions which was recently acquired by Violin Memory a company providing large amounts of server side Flash Memory infrastructure.

  • This article is part of a featured topic series on NoSQL

Related Sponsor

Neo4j is a robust, high-performance, scalable graph database. It is the only NOSQL database that solves the complex, connected data challenges that enterprises face today.

The idea of dynamically distributed key-value store by Andrei Sedoi Posted
Re: The idea of dynamically distributed key-value store by Cameron Purdy Posted
Re: The idea of dynamically distributed key-value store by James Phillips Posted
vBuckets are the key in membase by James Phillips Posted
Re: vBuckets are the key in membase by peter lin Posted
Re: vBuckets are the key in membase by Atif Khan Posted
Re: vBuckets are the key in membase by James Phillips Posted
Clustering over WAN by Atif Khan Posted
Re: Clustering over WAN by James Phillips Posted
  1. Back to top

    The idea of dynamically distributed key-value store

    by Andrei Sedoi

    I just wonder how can a key-value store be dynamically distributed based on a key. I mean the scenario when we add a new machine. In other words how is address of the machine "calculated" based on a key? It's obvious that key space cannot be redistributed in real time.

  2. Back to top

    vBuckets are the key in membase

    by James Phillips

    Hi Andrei. Below is a cut and paste from one of the membase functional specs. We'll be hanging those out on the project early next week. You are correct in pointing out that memcached uses a hashing function to directly map a key to a server in the list (this list can vary in size). The vBucket structure is one of the key mechanics allowing our ability to scale a cluster up or down elastically. If you send me your email (I'm james at northscale dot com) address, I'll send you a few specs.

    Same for you Michael - thanks for the mention!

    james.

    vBuckets defined

    A vBucket is defined as the “owner” of a subset of the key space of a membase cluster.

    Every key “belongs” in a vBucket. A mapping function is used to calculate the vbucket in which a given key belongs. In membase 1.6, that mapping function is a hashing function that takes a key as input and outputs a vBucket number. The vBucket number is used as an index into a table (the “vBucket Map”) which is consulted to determine which server is acting as Master Server for that vBucket. The table contains one row per vBucket, pairing the vBucket with its assigned Master Server. A server appearing in this table can be (and usually is) responsible for multiple vBuckets.

    The hashing function used by membase to map keys to vBuckets is configurable – both the hashing algorithm and the output space (i.e. the total number of vBuckets output by the function). Naturally, if the number of vBuckets in the output space of the hash function is changed, then the table which maps vBuckets to Servers must be resized.

  3. Back to top

    Re: The idea of dynamically distributed key-value store

    by Cameron Purdy

    I just wonder how can a key-value store be dynamically distributed based on a key. I mean the scenario when we add a new machine. In other words how is address of the machine "calculated" based on a key? It's obvious that key space cannot be redistributed in real time.


    This is exactly what Oracle Coherence does, and has been doing since early 2002 :-)

    Generally, you want to avoid mapping keys directly to machines, as that causes your memory overhead to increase in linear relationship to the number of keys. Instead, keys are mapped to a relatively small number of partitions, and partitions are mapped to machines. Failover, backup, and life-cycle are then managed at a partition level.

    Peace,

    Cameron Purdy | Oracle Coherence
    coherence.oracle.com/

  4. Back to top

    Re: vBuckets are the key in membase

    by peter lin

    It's good to see open source evolve and learn from commercial products. Does Northscale plan to copy all/some the features found in Coherence, Extreme scale and gigaspaces? Features like write behind, failover, etc?

  5. Back to top

    Clustering over WAN

    by Atif Khan

    Does the clustering and replication work over WAN as well? Is Membase being built to support cloud deployment with global footprint?

  6. Back to top

    Re: vBuckets are the key in membase

    by Atif Khan

    Peter,
    I think there is a slight difference between the products like Coherence, Extreme Scale, Gigaspaces and something like Membase. The products you mentioned are targeted towards providing a distributed grid to for data/object storage and mainly targeted toward in-memory storage. Some of these do enable a persistence store as well.
    Membase on the other hand is a persistence engine that implements some of the concepts of in-memory grid.

  7. Back to top

    Re: vBuckets are the key in membase

    by James Phillips

    Hi Peter -
    Membase has both write behind and failover...but so have most data management systems built over the last 40 or so years. ;P While the initial release of membase is obviously playing table stakes catch-up with existing systems, our plans certainly include some substantial innovation beyond the obvious. Happy to share more if you are interested.
    james.

  8. Back to top

    Re: The idea of dynamically distributed key-value store

    by James Phillips

    Nice to see you weigh in Cameron. Love coherence and everything you've done there. Certainly there are similarities between systems...but there will be divergence over time I believe.

  9. Back to top

    Re: Clustering over WAN

    by James Phillips

    A large social gaming company is using membase on ec2 across 2 data centers. They are using master-slave replication and primarily (currently) for disaster recovery. There is much work planned in this area though. Current capabilities are basic.

    James Phillips
    www.northscale.com

Educational Content

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.

Max Protect: Scalability and Caching at ESPN.com

Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.

The Seven Deadly Sins of Enterprise Agile Adoption

Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.