InfoQ

News

The RDBMS is not enough.

Posted by Sebastien Auvray on Nov 26, 2007

Community
Architecture,
Ruby
Topics
Data Access ,
Database Design ,
Performance & Scalability
Tags
Relational Databases ,
S3 ,
Database Management ,
CouchDB ,
Distributed Document Oriented Database ,
RDDB ,
Database ,
Scalability
While Relational Databases fit a client-server model, in a world of services new solutions are needed. RDBMS are subject to scalability issues: How to create redundancy, parallelism ?
[Relation Databases] become a single point of failure. In particular, replication is not trivial. To understand why, consider the problem of having two database servers that need to have identical data. Having both servers for reading and writing data makes it difficult to synchronize changes. Having one master server and another slave is bad too, because the master has to take all the heat when users are writing information.
In addition, Assaf Arkin also believes that write consistency is the reason RDBMS are imploding under their own weight.
Features like referential integrity, constraints and atomic updates are really important in the client-server world, but irrelevant in a world of services.
Those are typical issues that Document Oriented Distributed Databases are notably trying to address.
Damien Katz, software engineer at MySQL introduced the four pillars of Data Management:
  • Save: Data saving should be secure (ie ACID), permanent and efficient.
  • See: Data should be available for easy retrieval, integrate simple reporting methods and provide a (fulltext) search.
  • Secure: Compartmentalization of data, allow ssl connection, assign users, groups and roles to data...
  • Share: Be distributed, On and Offline.
With CouchDB, Damien is implementing those four pillars.

What CouchDB is
  • A document database server, accessible via a RESTful JSON API.
  • Ad-hoc and schema-free with a flat address space.
  • Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.
  • Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language.
What CouchDB is not
  • A relational database.
  • A replacement for relational databases.
  • An object-oriented database. More specifically, CouchDB is not meant to function as a seamless persistence layer for an OO programming language.
Inspired by CouchDB and the notion that you insert documents into the database and then define views for querying, Anthony Eden started to write his own Document-Oriented Database: RDDB. An exhaustive review is already available.

The features of RDDB at the moment are:
  • Documents are simply collections of name/value pairs.
  • Views can be defined with Ruby code.
  • A reduce block can be defined to reduce the initial mapped data from a view.
  • Views can be materialized to improve query performance.
  • Datastores/Viewstores/Materialization stores are pluggable. Current implementations are RAM, partitioned files/file system and Amazon S3.
  • Distributed materialization may work, but it's going to be rewritten.
InfoQ had the chance to catch up with Anthony and talk about RDDB, CouchDB and RDBMS.

Firstly what lead you to start working on RDDB, at Rejectconf you were talking about a research project?

I consider RDDB to be a personal research project. Over the last year I've been heavily involved in analytical systems, developing data warehouses and the like. I've also been using Amazon's web services. RDDB will hopefully allow me to put the two together at some point so I can have an analytical database that runs on EC2 and S3. That's my primary goal and the driving factor behind the creation of RDDB.

In your daily job you're exposed to Data integration issues; Do you think Document Oriented Distributed Database are underused today and that they will be adopted more and more?

I'm not sure yet. There is a lot of history behind relational databases and they've had a lot of time to mature. On the one hand this makes them the obvious choice for operational systems since they can be trusted. On the other hand relational databases aren't necessarily the best choice for all types of data storage and lookup, so there are opportunities for new data stores, I'm just not sure of document oriented databases are going to be it - I think it will largely depend on their scalability and their ability to deal with massive amounts of documents without degradation of performance.

Is there still a place for RDBMS in a world of services model? While referential integrity, atomic updates and constraints makes sense in a client-server world, is it still relevant in a world of services?

It's still the standard by which others are judged, so I don't see relational databases going anywhere anytime soon. Ultimately I think we may be able to move past needing atomic updates if we have databases that are temporal in nature, thus removing the need for updates of any sort. Referential integrity can also probably go if we move to an environment where everything that is absolutely necessary is included in a resource and systems become more tolerant to missing links. Constraints will probably always be useful and perhaps they may become even richer with the ability to define logic for constraints.

How do you compare RDDB to CouchDB? (I know you're in early early stage of development, CouchDB is also). What would be the advantage of using RDDB over CouchDB Ruby binding?

I think I can answer these two together. Since CouchDB is written in Erlang and RDDB is written in Ruby, for a Ruby developer RDDB will be much more hackable. CouchDB uses the language features of Erlang for interprocess communication during distributed processing whereas Ruby relies on libraries, such as Rinda and the Ruby SQS library. For a Ruby developer the cost of getting RDDB up and running is going to be significantly less than CouchDB since all you have to do with RDDB is use RubyGems to install. The views in RDDB are written in Ruby whereas the CouchDB views are JSON (at least for now). I think that at the moment RDDB is more pluggable with different implementations for document store, view store and materialization store (all of which support RAM, file system, S3 storage). RDDB also has different implementations for materialization (such as local, Rinda and EC2), as well as threaded and non-threaded materializers.

We wrote an article about ActiveWarehouse a while back, how is the project going? Is it used in the enterprise?

ActiveWarehouse has been quiet lately. I believe that most of the work and usage is on the ETL side with the ActiveWarehouse ETL library. My goal is to release the 1.0 version of ActiveWarehouse ETL in the near future. As for the Rails plugin, it definitely needs more work on the display side before it can get raised up to a 1.0 version. Some people have expressed interest in revising the user interface code so we'll see where that leads.

10 comments

Watch Thread Reply

Copyright notice on the slides by Jan Lehnardt Posted Nov 26, 2007 12:25 PM
Re: Copyright notice on the slides by Werner Schuster Posted Nov 26, 2007 1:12 PM
Re: Copyright notice on the slides by Jan Lehnardt Posted Nov 26, 2007 1:26 PM
Re: Copyright notice on the slides by Anthony Eden Posted Nov 26, 2007 5:00 PM
Re: Copyright notice on the slides by Jan Lehnardt Posted Nov 27, 2007 2:35 AM
RDBMS and Service Layer by Alex Popescu Posted Nov 27, 2007 3:33 AM
Re: RDBMS and Service Layer by Jan Lehnardt Posted Nov 27, 2007 5:48 AM
Re: RDBMS and Service Layer by Alex Popescu Posted Nov 27, 2007 9:14 AM
Re: RDBMS and Service Layer by Jan Lehnardt Posted Nov 29, 2007 3:04 AM
There's another cool solution to this problem: Xcalia by Matthew Adams Posted Nov 28, 2007 11:34 AM
  1. Back to top

    Copyright notice on the slides

    Nov 26, 2007 12:25 PM by Jan Lehnardt

    Heya,
    nice article & interview! I'd like to point out that the image above and the linked slides are subject to the creative commons license (specifically creativecommons.org/licenses/by-nc-nd/2.0/). See jan.prima.de/~jan/plok/archives/105-Slides-From... for the original publication and extensive commentary.

    And commenting to Anthony's take on CouchDB. Although he doesn't state it explicitly, I'd like to point out (and I think he implies that) CouchDB is not as Ruby-Hacker-friendly as RDDB, but it might have advantages when scaling up and out over multiple machines and locations.

    Cheers,
    Jan
    --

  2. Back to top

    Re: Copyright notice on the slides

    Nov 26, 2007 1:12 PM by Werner Schuster

    @Jan:
    Thanks for the link to the slides and blog entry. I removed the image - it's better to see it in the context of your slides anyway.

  3. Back to top

    Re: Copyright notice on the slides

    Nov 26, 2007 1:26 PM by Jan Lehnardt

    Oh, a copyright note would have been enough, you can sure use the image! :-) Still, thanks for being quick about it.

  4. Back to top

    Re: Copyright notice on the slides

    Nov 26, 2007 5:00 PM by Anthony Eden

    Jan,

    CouchDB might have indeed have advantages when scaling up and out over multiple machines and locations at this point and time, especially given the support for interprocess communication both locally and over the net that is built into Erlang, however I would argue that the same thing can be accomplished in Ruby through use of libraries now. The approach in RDDB will be to use EC2 and S3 or possibly Rinda or maybe some other sort of messaging system. I also think it will be interesting to see what Matz does in the next couple of years since he brought up his interest in this subject and RubyConf this year.

    -Anthony

  5. Back to top

    Re: Copyright notice on the slides

    Nov 27, 2007 2:35 AM by Jan Lehnardt

    Heya Anthony,
    thanks for chiming in! There are advantages for both systems; just like CouchDB does not want to replace the RDBMS, as there's room for both. I'll definitely have a closer look at RDDB now ;-)

    Cheers,
    Jan
    --

  6. Back to top

    RDBMS and Service Layer

    Nov 27, 2007 3:33 AM by Alex Popescu

    IMO what Document Oriented Distributed Database are evangelizing is just a set of services that can work with the current RDBMS. For example in Java world most of these service have been standardized through a JSR (JSR-170) on top of physical storage.

    See: Data should be available for easy retrieval, integrate simple reporting methods and provide a (fulltext) search.


    I would say that the relational model is very good fit for reporting. And the most of the current RDBMS are already providing support for fulltext indexing.

    Secure: Compartmentalization of data [...]


    I assume this means veritical/horizontal partitioning, which is another feature that current RDBMS are providing or at least starting to consider.

    Concluding, I wouldn't say that RDBMS are having problems in some of these directions. Indeed, as the requirements of the today apps are very high, we are waiting for the RDBMS providers to try to catch up with the latest requirements on their side.

    ./alex
    --
    .w( the_mindstorm )p.
    Alexandru Popescu
    Senior Software Eng.
    InfoQ Techlead/Co-founder

  7. Back to top

    Re: RDBMS and Service Layer

    Nov 27, 2007 5:48 AM by Jan Lehnardt

    > I assume this means veritical/horizontal partitioning, which is another feature that current RDBMS are providing or at least starting to consider.

    No, that means that data you put in in the name of user X is not readable by requests from user Y, for example.

    Also, by no means, traditional RDBM systems are not good at what's mentioned in the four pillars. The pillars are, in fact, a guide to help comparing different data storage systems. When it comes to sharing though, most traditional systems fall flat on the face, sorry :-)

    Again, CouchDB is not here to replace anything, it is just another tool worth considering for certain types of problems.

    Cheers,
    Jan
    --

  8. Back to top

    Re: RDBMS and Service Layer

    Nov 27, 2007 9:14 AM by Alex Popescu

    Thanks for the clarification Jan. So, "data compartmentization" is in fact value-level ACL. That's an interesting concept. Till now I was thinking about having column level ACL and I am wondering if such a fine grained ACL wouldn't result (if extensively used) in weird reporting results. Lets say you are querying for some data (in the name of user X that might have hidden values), how do you make the difference between a NULL value and a forbidden value? (will you introduce a new NULL-like value?) On a different level, this feature kind of ruines the possibilities to cache things. ... Well, I think it is too fine grained for me :-).

    ./alex
    --
    .w( the_mindstorm )p.
    Alexandru Popescu
    Senior Software Eng.
    InfoQ Techlead/Co-founder

  9. You should check out XIC from Xcalia (www.xcalia.com). It's another solution to the problem of heterogeneous data access, only it uses transparent persistence APIs (JDO, JPA, SDO) to solve the problem. Of course, that does mean that you're talking about a Java (or Groovy) solution, but that serves a lot of applications out there.

    You can think of XIC as an extension of object-relational mapping to object-service mapping, only with XIC, there is no explicit mapping. Check out the slides at
    www.xcalia.com/technology/circumvent-SOA-design...
    for more info. Very much worth a look-see.

    -matthew

  10. Back to top

    Re: RDBMS and Service Layer

    Nov 29, 2007 3:04 AM by Jan Lehnardt

    Thanks for the clarification Jan. So, "data compartmentization" is in fact value-level ACL.


    Well, not quite! :-) It is document-level ACL if you will. A document is roughly a data-record. They will have ownership and permissions and all that.

    Cheers,
    Jan
    --

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.