InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Martin Fowler Sees a Thaw in Frozen Thinking about Data Storage

Posted by Abel Avram on Nov 25, 2008

Sections
Architecture & Design,
Development,
Operations & Infrastructure
Topics
Data Access ,
Architecture ,
.NET ,
Java ,
Ruby
Tags
GemStone ,
MagLev ,
Relational Databases ,
Object Databases ,
CouchDB

In a recent blog post, Martin Fowler, a renowned software thought leader, observed at last week's QCon that the deep freeze in thinking about databases in application architectures is thawing. The world has been stuck using RDBMS databases for every application use case, but the time has come to also consider RISC RDBMS or distributed document-oriented databases.   QCon had a keynote by Tim Bray about the changing storage spectrum and how it affects application architectures, as well a whole track on distributed document-oriented databases

After noting the failure of ODBMS databases, Martin expressed his opinion on why RDBMS succeeded: “their [RDBMS] dominance is due less to their role in data management than their role in integration”.  Continuing on:

For many organizations today, the primary pattern for integration is Shared Database Integration - where multiple applications are integrated by all using a common database. When you have these IntegrationDatabases, it's important that all these applications can easily get at this shared data - hence the all important role of SQL. The role of SQL as mostly-standard query language has been central to the dominance of databases.

The Internet is changing the landscape by offering new integration solutions:

The heating of the database space comes from the presence of alternatives to integration - in particular the rise of web services. Under various banners there's a growing movement for applications to talk to each other by passing text (mostly XML) documents over HTTP. The web, both in internet and intranet forms, has made this integration mode even more prevalent than SQL. This is a good thing, I've never liked the approach of multiple applications tightly coupled through a common database - you can't get bigger breach of encapsulation than that.

HTTP will affect the way databases are used, according to Martin:

If you switch your integration protocol from SQL to HTTP, it now means you can change databases from being IntegrationDatabases to ApplicationDatabases. This change is profound. In the first step it supports a much simpler approach to object-relational mapping - such as the approach taken by Ruby on Rails. But furthermore it breaks the vice-like grip of the relational data model. If you integrate through HTTP it no longer matters how an application stores its own data, which in turn means an application can choose a data model that makes sense for its own needs.

While Martin does not think RDBMS will disappear any time soon, he points out the a number of possible alternatives that Tim Bray had mentioned:

  • Drizzle is a form of relational database, but one that eschews much of the machinery of modern relational products. I think of it as a RISC RDBMS - supporting only the bare bones of the relational feature set.
  • Couch DB is one of many forays into a distributed key-value pair model. Although a sharply simple data-model (nothing more than a hashmap really) this kind of approach has become quite popular in high-volume websites.
  • Gemstone was one of the object database crowd, and I found the Gemstone-Smalltalk combination a very powerful development environment (superior to most of its successors). Gemstone is still around as a niche player, but may gain more traction through
  • Maglev - a project to bring its approach (essentially a fusion of database and virtual machine) to the Ruby world.

Martin is careful to conclude that RDBMS are not going away and are "the right choice for many situations." His blog does suggest however that given the increase in options these days, "application developers should think about what the right option is for their needs. As non-relational projects grow in popularity and maturity, more and more will go for other options."   What do you think?

15 comments

Watch Thread Reply

Depends on the situation by Peter Veentjer Posted
Re: Depends on the situation by Mark N Posted
Re: Depends on the situation by Francisco Jose Peredo Noguez Posted
jMaglev by ARI ZILKA Posted
And what about TRDBMS by Francisco Jose Peredo Noguez Posted
Can non-relation database avoid the problem of Shared Database Integration? by Zhang Joey Posted
data semantics by Techno Modus Posted
HTTP cannot replace SQL by Frank Silbermann Posted
Re: HTTP cannot replace SQL by Mark N Posted
Re: HTTP cannot replace SQL by Francisco Jose Peredo Noguez Posted
Re: HTTP cannot replace SQL by Francisco Jose Peredo Noguez Posted
What about XML databases? by Miguel Vitorino Posted
Re: What about XML databases? by Francisco Jose Peredo Noguez Posted
Re: What about XML databases? by Miguel Vitorino Posted
XML Databases by Miguel Vitorino Posted
  1. Back to top

    Depends on the situation

    by Peter Veentjer

    I think it really depends on the situation.

    For example: If a RDBMS is configured correctly, it is great for doing batch processing. The concurrency mechanisms are well documented in most cases. I don't see the need for a different database mechanism in these cases.

    But distributed memory (often ACID with the D) is a lot easier to scale than database. So if you have key/value based searchs, I would hava a look at Terracotta/Coherence and use the database purely as backup mechanism (the D).

  2. Back to top

    jMaglev

    by ARI ZILKA

    Check out jMaglev before Maglev, IMO:

    fabiokung.com/2008/11/22/play-with-jmaglev-your...

  3. Back to top

    And what about TRDBMS

    by Francisco Jose Peredo Noguez

    I am really dissapointed to read this, I do not think that RDBMS are going to be used less, on the contrary, someday people are going to realize that we have not even started to use them, that day we will drop SQL and its many flaws and use a really relational language,an industrial D, as proposed in the Third Manifesto

  4. Back to top

    Can non-relation database avoid the problem of Shared Database Integration?

    by Zhang Joey

    Integrating applications directly with common database may break the encapsulation of each application, but I don't think it's a problem of only relational database. Even use the new non-relational database, if adopt database integration architecture, we still need to face the problem that the way one app stores its data may impact other apps integrated.

  5. Back to top

    data semantics

    by Techno Modus

    I think managing data semantics is currently one of the most important issues in data modeling. It is as important as semantics in Semantic Web. Recently I have found new interesting emerging approaches which could solve some problems in data semantics like associative model of data and concept-oriented model: Informal Introduction into the Concept-Oriented Data Model, Informal Introduction into the Concept-Oriented Programming. There is also an interesting paper by Michael Stonebraker on this topic: One Size Fits All: An Idea Whose Time Has Come and Gone

  6. Back to top

    HTTP cannot replace SQL

    by Frank Silbermann

    When another application uses your data, it probably uses it in a manner you did not forsee. It's not necessary to forsee all the ways your data will be used, because a RDBMS provides a flexible query language, SQL. One reason for the failure of object oriented DBMS was the lack of a standard, flexible and efficient ODBMS query language.

    We will not be able to replace integration databases with application databases until someone invents and implements an equally flexible and efficient application query language. HTTP that provides access to a small, canned application API will not suffice.

    HTTP is not analogous to SQL; rather, it is analogous to the code that implements a networked database driver; the driver would be useless without the ability to run arbitrary SQL commands when the request arrives at the database machine.

    Using the RDMS as the integration point results in a star-shaped topology -- with the RDBMS at the center. Too much reliance on web services for application integration can easily result in a spaghetti-shaped topopology.

  7. Back to top

    Re: Depends on the situation

    by Mark N

    For example: If a RDBMS is configured correctly, it is great for doing batch processing. The concurrency mechanisms are well documented in most cases. I don't see the need for a different database mechanism in these cases.
    Actually, IMS is probably better for batch processing. Either way, getting rid of or minimizing the batch processing should be the primary objective.

  8. Back to top

    Re: HTTP cannot replace SQL

    by Mark N

    Using the RDBMS at the center creates bottle necks and dependence on db vendors. There are other solutions than web services like transmitting required information to other systems (aka Loosely Coupled systems).

  9. Back to top

    Re: HTTP cannot replace SQL

    by Francisco Jose Peredo Noguez

    And what language will you use to specify what information you want to be transmitted? most likely something based on relational algebra... the relational model is the best tool for the job, but SQL is a bad implementation of it, with lots of flaws, what we need is a D.

  10. Back to top

    Re: Depends on the situation

    by Francisco Jose Peredo Noguez

    And how would you do that? (How would you getting rid of or minimize the batch processing?)

  11. Back to top

    Re: HTTP cannot replace SQL

    by Francisco Jose Peredo Noguez

    A more flexible and efficient query language has alredy been invented, and it is of course also based on the relational model, you can read about it in The Third Manifesto, it is called D and it is what Sql should have been.

    I agree with you that HTTP that provides access to a small, canned application API will not suffice, you really understand my point, HTTP is only transport, it has no ability to deal with queries, and you need them to actually manipulate data.

    The spaghetti-shaped topopology of WebServices can be avoided with an ESB, but that still does not solve the need to have a relational model based language to manipulate data.

  12. Back to top

    What about XML databases?

    by Miguel Vitorino

    We see more and more data being transmitted over the wire in XML formats (and, yeah, the protocol is mostly HTTP...but we can't query anything with HTTP alone).

    Databases like eXist and Mark Logic support both structured and semi-structured data and minimize the typically necessary data transformations (relational <-> object <-> XML/JSON...).
    They can store documents, hierarchical data and very strong typed data.
    We can more easily support schema versioning.
    The query languages (XQuery and XPath) and schema languages (XSD/DTD) are standard.
    Replication and clustering are performed more naturally.
    They discourage monolithic database designs.
    They scale better to a web world because of its native integration with HTTP and greater record granularity...

    Any thoughts?

    </-></->

  13. Back to top

    XML Databases

    by Miguel Vitorino

    For those of you who may be curious, checkout an instance of the Mark Logic database at markmail.org.

    Also, I would like to make clear that I have no affiliation with either eXist or MarkLogic. I'm merely interested in following their progress.

  14. Back to top

    Re: What about XML databases?

    by Francisco Jose Peredo Noguez

    And we can't query anything with XML alone either. You need a program, written in something else. You need a query language, preferably one with strong relational theory supporting it.

    Data trasnformations can be built easly as a thing layer that exposes the output of you relationa queries as XML/JSON... I do not see the big deal here...

    What is "very" strong typed data? what is the difference with "plain" strong typed data?

    Why do they more easily support schema versioning?

    And XQuery and XPath are inmmune to Sql flaws?

    Why are replication and clustering are performed more naturally? What does XML have to do with it?

    Why do they discourage monolithic database designs?

    I don't see how their native integration with HTTP and greater record granularity make them scale better... can you explain?

  15. Back to top

    Re: What about XML databases?

    by Miguel Vitorino

    Do we absolutely need a strong relational theory behind a database? Is that the _only_ way to have nice performance? Or do you admit there may be other options? Do you see any relational theory behind Google's BigTable or Amazon's Simple DB?

    Schema versioning is easier for the simple reason that schema is not required at all in a XML database. You can choose to use schema, and have several options for that, or you can store raw data and still be able to query that data. Can you do that in a relational database?

    Replication and clustering are done more easily because you store related data logically and physically closer. With XML you work at an aggregate level, not at the record/tuple level.

    If access to XML databases is designed to be inherently RESTful, it will naturally scale better.
    The easiest way to do this, is to embrace the benefits of HTTP - which many of them already do. Of course this will only work with greater record granularity (you don't wanna repeat the same mistakes from CORBA...).

    That "thin" layer you refer to only performs format conversions. I believe that is different from data transformations. And no matter how "thin" that layer is, it always has to be there, and usually done by hand, if you have an underlying relational data source that does not quite have the same data representation capabilities your destination formats do.

Educational Content

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.

Max Protect: Scalability and Caching at ESPN.com

Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.

The Seven Deadly Sins of Enterprise Agile Adoption

Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.

Questions for an Enterprise Architect

Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?

Wrap Your SQL Head Around Riak MapReduce

Sean Cribbs explains what Map-Reduce and Riak are, why and how to use Map-Reduce with Riak, and how to convert SQL queries into their Map-Reduce equivalents.