Transactions without Transactions
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Michael Hunger on Jun 23, 2010
On March, 23 North Scale Solutions publicly announced the availability of the Membase NoSQL database solution. This release sparked a lot of coverage (MarketWire, TheRegister, GigaOM).
It was developed by members of the memcached core team at North Scale, together with engineers from the two
major contributors Zynga and NHN, both big players in the online game and social network space.
Other early adopters are mig33 (mobile applications) and Red Aril (advertisment).

The open source project at Membase.org makes the sourcecode available under the Apache 2.0 license. It is hosted at GitHub. Source tarballs and Linux binaries are available as public beta for download.
Commercial support is provided by North Scale with their dedicated server software that complements the existing support for memcached server.
Besides the press releases there are not many technical facts about the database available. The best insights may be gained by looking at the source code.
The main, hard objectives in developing Membase were: "Simple, Fast, Elastic".
The simplicity is provided by the key value store. There is no additional query capability (yet). Extensions are possible through a plug-in architecture (hooks through filtered TAP interface) which can be used for full-text search, backup or data warehouse dumps. Some other (planned) extensions points are the Data bucket – engine API for specialized container types and the future "NodeCode".
Ease of installation, operation and extension from single nodes to clusters as well as the drop-in replacement for memcached (wire-protocol compatibility) offer a low treshold for developer and operations buy in. Memcached is already widely used as caching solution in many different types of applications (especially high-throughput webapps). Memcached's codebase is partly directly used in the front end to the Membase server.
Through this compatibility bindings for many programming languages and frameworks can be reused for Membase. For managing Membase installations, graphical and programmatic interfaces and a configurable alerting are available.
Membase is designed to scale-out linearly, it contains uniform nodes that can be duplicated for increasing cluster capacity. It is still necessary to initiate a redistribution of the stored data.
One interesting attribute of this NoSQL solution is the promised predictable performance and quasi-deterministic latency and troughput. This should be achieved by:
Two of the more technical slides of the North Scale presentation:


Alex Popescu pointed out the lack of technical information and referred to Gear6 Memcached solutions which was recently acquired by Violin Memory a company providing large amounts of server side Flash Memory infrastructure.
I just wonder how can a key-value store be dynamically distributed based on a key. I mean the scenario when we add a new machine. In other words how is address of the machine "calculated" based on a key? It's obvious that key space cannot be redistributed in real time.
Hi Andrei. Below is a cut and paste from one of the membase functional specs. We'll be hanging those out on the project early next week. You are correct in pointing out that memcached uses a hashing function to directly map a key to a server in the list (this list can vary in size). The vBucket structure is one of the key mechanics allowing our ability to scale a cluster up or down elastically. If you send me your email (I'm james at northscale dot com) address, I'll send you a few specs.
Same for you Michael - thanks for the mention!
james.
vBuckets defined
A vBucket is defined as the “owner” of a subset of the key space of a membase cluster.
Every key “belongs” in a vBucket. A mapping function is used to calculate the vbucket in which a given key belongs. In membase 1.6, that mapping function is a hashing function that takes a key as input and outputs a vBucket number. The vBucket number is used as an index into a table (the “vBucket Map”) which is consulted to determine which server is acting as Master Server for that vBucket. The table contains one row per vBucket, pairing the vBucket with its assigned Master Server. A server appearing in this table can be (and usually is) responsible for multiple vBuckets.
The hashing function used by membase to map keys to vBuckets is configurable – both the hashing algorithm and the output space (i.e. the total number of vBuckets output by the function). Naturally, if the number of vBuckets in the output space of the hash function is changed, then the table which maps vBuckets to Servers must be resized.
I just wonder how can a key-value store be dynamically distributed based on a key. I mean the scenario when we add a new machine. In other words how is address of the machine "calculated" based on a key? It's obvious that key space cannot be redistributed in real time.
This is exactly what Oracle Coherence does, and has been doing since early 2002 :-)
Generally, you want to avoid mapping keys directly to machines, as that causes your memory overhead to increase in linear relationship to the number of keys. Instead, keys are mapped to a relatively small number of partitions, and partitions are mapped to machines. Failover, backup, and life-cycle are then managed at a partition level.
Peace,
Cameron Purdy | Oracle Coherence
coherence.oracle.com/
It's good to see open source evolve and learn from commercial products. Does Northscale plan to copy all/some the features found in Coherence, Extreme scale and gigaspaces? Features like write behind, failover, etc?
Does the clustering and replication work over WAN as well? Is Membase being built to support cloud deployment with global footprint?
Peter,
I think there is a slight difference between the products like Coherence, Extreme Scale, Gigaspaces and something like Membase. The products you mentioned are targeted towards providing a distributed grid to for data/object storage and mainly targeted toward in-memory storage. Some of these do enable a persistence store as well.
Membase on the other hand is a persistence engine that implements some of the concepts of in-memory grid.
Hi Peter -
Membase has both write behind and failover...but so have most data management systems built over the last 40 or so years. ;P While the initial release of membase is obviously playing table stakes catch-up with existing systems, our plans certainly include some substantial innovation beyond the obvious. Happy to share more if you are interested.
james.
Nice to see you weigh in Cameron. Love coherence and everything you've done there. Certainly there are similarities between systems...but there will be divergence over time I believe.
A large social gaming company is using membase on ec2 across 2 data centers. They are using master-slave replication and primarily (currently) for disaster recovery. There is much work planned in this area though. Current capabilities are basic.
James Phillips
www.northscale.com
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.
John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.
Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.
Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.
9 comments
Watch Thread Reply