BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Martin Fowler Sees a Thaw in Frozen Thinking about Data Storage

Martin Fowler Sees a Thaw in Frozen Thinking about Data Storage

Leia em Português

This item in japanese

Bookmarks

In a recent blog post, Martin Fowler, a renowned software thought leader, observed at last week's QCon that the deep freeze in thinking about databases in application architectures is thawing. The world has been stuck using RDBMS databases for every application use case, but the time has come to also consider RISC RDBMS or distributed document-oriented databases.   QCon had a keynote by Tim Bray about the changing storage spectrum and how it affects application architectures, as well a whole track on distributed document-oriented databases

After noting the failure of ODBMS databases, Martin expressed his opinion on why RDBMS succeeded: “their [RDBMS] dominance is due less to their role in data management than their role in integration”.  Continuing on:

For many organizations today, the primary pattern for integration is Shared Database Integration - where multiple applications are integrated by all using a common database. When you have these IntegrationDatabases, it's important that all these applications can easily get at this shared data - hence the all important role of SQL. The role of SQL as mostly-standard query language has been central to the dominance of databases.

The Internet is changing the landscape by offering new integration solutions:

The heating of the database space comes from the presence of alternatives to integration - in particular the rise of web services. Under various banners there's a growing movement for applications to talk to each other by passing text (mostly XML) documents over HTTP. The web, both in internet and intranet forms, has made this integration mode even more prevalent than SQL. This is a good thing, I've never liked the approach of multiple applications tightly coupled through a common database - you can't get bigger breach of encapsulation than that.

HTTP will affect the way databases are used, according to Martin:

If you switch your integration protocol from SQL to HTTP, it now means you can change databases from being IntegrationDatabases to ApplicationDatabases. This change is profound. In the first step it supports a much simpler approach to object-relational mapping - such as the approach taken by Ruby on Rails. But furthermore it breaks the vice-like grip of the relational data model. If you integrate through HTTP it no longer matters how an application stores its own data, which in turn means an application can choose a data model that makes sense for its own needs.

While Martin does not think RDBMS will disappear any time soon, he points out the a number of possible alternatives that Tim Bray had mentioned:

  • Drizzle is a form of relational database, but one that eschews much of the machinery of modern relational products. I think of it as a RISC RDBMS - supporting only the bare bones of the relational feature set.
  • Couch DB is one of many forays into a distributed key-value pair model. Although a sharply simple data-model (nothing more than a hashmap really) this kind of approach has become quite popular in high-volume websites.
  • Gemstone was one of the object database crowd, and I found the Gemstone-Smalltalk combination a very powerful development environment (superior to most of its successors). Gemstone is still around as a niche player, but may gain more traction through
  • Maglev - a project to bring its approach (essentially a fusion of database and virtual machine) to the Ruby world.

Martin is careful to conclude that RDBMS are not going away and are "the right choice for many situations." His blog does suggest however that given the increase in options these days, "application developers should think about what the right option is for their needs. As non-relational projects grow in popularity and maturity, more and more will go for other options."   What do you think?

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Depends on the situation

    by Peter Veentjer,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think it really depends on the situation.

    For example: If a RDBMS is configured correctly, it is great for doing batch processing. The concurrency mechanisms are well documented in most cases. I don't see the need for a different database mechanism in these cases.

    But distributed memory (often ACID with the D) is a lot easier to scale than database. So if you have key/value based searchs, I would hava a look at Terracotta/Coherence and use the database purely as backup mechanism (the D).

  • jMaglev

    by ARI ZILKA,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Check out jMaglev before Maglev, IMO:

    fabiokung.com/2008/11/22/play-with-jmaglev-your...

  • And what about TRDBMS

    by Francisco Jose Peredo Noguez,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I am really dissapointed to read this, I do not think that RDBMS are going to be used less, on the contrary, someday people are going to realize that we have not even started to use them, that day we will drop SQL and its many flaws and use a really relational language,an industrial D, as proposed in the Third Manifesto

  • Can non-relation database avoid the problem of Shared Database Integration?

    by Zhang Joey,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Integrating applications directly with common database may break the encapsulation of each application, but I don't think it's a problem of only relational database. Even use the new non-relational database, if adopt database integration architecture, we still need to face the problem that the way one app stores its data may impact other apps integrated.

  • data semantics

    by Techno Modus,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think managing data semantics is currently one of the most important issues in data modeling. It is as important as semantics in Semantic Web. Recently I have found new interesting emerging approaches which could solve some problems in data semantics like associative model of data and concept-oriented model: Informal Introduction into the Concept-Oriented Data Model, Informal Introduction into the Concept-Oriented Programming. There is also an interesting paper by Michael Stonebraker on this topic: One Size Fits All: An Idea Whose Time Has Come and Gone

  • HTTP cannot replace SQL

    by Frank Silbermann,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    When another application uses your data, it probably uses it in a manner you did not forsee. It's not necessary to forsee all the ways your data will be used, because a RDBMS provides a flexible query language, SQL. One reason for the failure of object oriented DBMS was the lack of a standard, flexible and efficient ODBMS query language.

    We will not be able to replace integration databases with application databases until someone invents and implements an equally flexible and efficient application query language. HTTP that provides access to a small, canned application API will not suffice.

    HTTP is not analogous to SQL; rather, it is analogous to the code that implements a networked database driver; the driver would be useless without the ability to run arbitrary SQL commands when the request arrives at the database machine.

    Using the RDMS as the integration point results in a star-shaped topology -- with the RDBMS at the center. Too much reliance on web services for application integration can easily result in a spaghetti-shaped topopology.

  • Re: Depends on the situation

    by Mac Noodle,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    For example: If a RDBMS is configured correctly, it is great for doing batch processing. The concurrency mechanisms are well documented in most cases. I don't see the need for a different database mechanism in these cases.
    Actually, IMS is probably better for batch processing. Either way, getting rid of or minimizing the batch processing should be the primary objective.

  • Re: HTTP cannot replace SQL

    by Mac Noodle,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Using the RDBMS at the center creates bottle necks and dependence on db vendors. There are other solutions than web services like transmitting required information to other systems (aka Loosely Coupled systems).

  • Re: HTTP cannot replace SQL

    by Francisco Jose Peredo Noguez,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    And what language will you use to specify what information you want to be transmitted? most likely something based on relational algebra... the relational model is the best tool for the job, but SQL is a bad implementation of it, with lots of flaws, what we need is a D.

  • Re: Depends on the situation

    by Francisco Jose Peredo Noguez,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    And how would you do that? (How would you getting rid of or minimize the batch processing?)

  • Re: HTTP cannot replace SQL

    by Francisco Jose Peredo Noguez,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    A more flexible and efficient query language has alredy been invented, and it is of course also based on the relational model, you can read about it in The Third Manifesto, it is called D and it is what Sql should have been.

    I agree with you that HTTP that provides access to a small, canned application API will not suffice, you really understand my point, HTTP is only transport, it has no ability to deal with queries, and you need them to actually manipulate data.

    The spaghetti-shaped topopology of WebServices can be avoided with an ESB, but that still does not solve the need to have a relational model based language to manipulate data.

  • What about XML databases?

    by Miguel Vitorino,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    We see more and more data being transmitted over the wire in XML formats (and, yeah, the protocol is mostly HTTP...but we can't query anything with HTTP alone).

    Databases like eXist and Mark Logic support both structured and semi-structured data and minimize the typically necessary data transformations (relational <-> object <-> XML/JSON...).
    They can store documents, hierarchical data and very strong typed data.
    We can more easily support schema versioning.
    The query languages (XQuery and XPath) and schema languages (XSD/DTD) are standard.
    Replication and clustering are performed more naturally.
    They discourage monolithic database designs.
    They scale better to a web world because of its native integration with HTTP and greater record granularity...

    Any thoughts?

    </-></->

  • XML Databases

    by Miguel Vitorino,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    For those of you who may be curious, checkout an instance of the Mark Logic database at markmail.org.

    Also, I would like to make clear that I have no affiliation with either eXist or MarkLogic. I'm merely interested in following their progress.

  • Re: What about XML databases?

    by Francisco Jose Peredo Noguez,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    And we can't query anything with XML alone either. You need a program, written in something else. You need a query language, preferably one with strong relational theory supporting it.

    Data trasnformations can be built easly as a thing layer that exposes the output of you relationa queries as XML/JSON... I do not see the big deal here...

    What is "very" strong typed data? what is the difference with "plain" strong typed data?

    Why do they more easily support schema versioning?

    And XQuery and XPath are inmmune to Sql flaws?

    Why are replication and clustering are performed more naturally? What does XML have to do with it?

    Why do they discourage monolithic database designs?

    I don't see how their native integration with HTTP and greater record granularity make them scale better... can you explain?

  • Re: What about XML databases?

    by Miguel Vitorino,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Do we absolutely need a strong relational theory behind a database? Is that the _only_ way to have nice performance? Or do you admit there may be other options? Do you see any relational theory behind Google's BigTable or Amazon's Simple DB?

    Schema versioning is easier for the simple reason that schema is not required at all in a XML database. You can choose to use schema, and have several options for that, or you can store raw data and still be able to query that data. Can you do that in a relational database?

    Replication and clustering are done more easily because you store related data logically and physically closer. With XML you work at an aggregate level, not at the record/tuple level.

    If access to XML databases is designed to be inherently RESTful, it will naturally scale better.
    The easiest way to do this, is to embrace the benefits of HTTP - which many of them already do. Of course this will only work with greater record granularity (you don't wanna repeat the same mistakes from CORBA...).

    That "thin" layer you refer to only performs format conversions. I believe that is different from data transformations. And no matter how "thin" that layer is, it always has to be there, and usually done by hand, if you have an underlying relational data source that does not quite have the same data representation capabilities your destination formats do.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT