BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Martin Fowler Sees a Thaw in Frozen Thinking about Data Storage

by Abel Avram on Nov 25, 2008 |

In a recent blog post, Martin Fowler, a renowned software thought leader, observed at last week's QCon that the deep freeze in thinking about databases in application architectures is thawing. The world has been stuck using RDBMS databases for every application use case, but the time has come to also consider RISC RDBMS or distributed document-oriented databases.   QCon had a keynote by Tim Bray about the changing storage spectrum and how it affects application architectures, as well a whole track on distributed document-oriented databases

After noting the failure of ODBMS databases, Martin expressed his opinion on why RDBMS succeeded: “their [RDBMS] dominance is due less to their role in data management than their role in integration”.  Continuing on:

For many organizations today, the primary pattern for integration is Shared Database Integration - where multiple applications are integrated by all using a common database. When you have these IntegrationDatabases, it's important that all these applications can easily get at this shared data - hence the all important role of SQL. The role of SQL as mostly-standard query language has been central to the dominance of databases.

The Internet is changing the landscape by offering new integration solutions:

The heating of the database space comes from the presence of alternatives to integration - in particular the rise of web services. Under various banners there's a growing movement for applications to talk to each other by passing text (mostly XML) documents over HTTP. The web, both in internet and intranet forms, has made this integration mode even more prevalent than SQL. This is a good thing, I've never liked the approach of multiple applications tightly coupled through a common database - you can't get bigger breach of encapsulation than that.

HTTP will affect the way databases are used, according to Martin:

If you switch your integration protocol from SQL to HTTP, it now means you can change databases from being IntegrationDatabases to ApplicationDatabases. This change is profound. In the first step it supports a much simpler approach to object-relational mapping - such as the approach taken by Ruby on Rails. But furthermore it breaks the vice-like grip of the relational data model. If you integrate through HTTP it no longer matters how an application stores its own data, which in turn means an application can choose a data model that makes sense for its own needs.

While Martin does not think RDBMS will disappear any time soon, he points out the a number of possible alternatives that Tim Bray had mentioned:

  • Drizzle is a form of relational database, but one that eschews much of the machinery of modern relational products. I think of it as a RISC RDBMS - supporting only the bare bones of the relational feature set.
  • Couch DB is one of many forays into a distributed key-value pair model. Although a sharply simple data-model (nothing more than a hashmap really) this kind of approach has become quite popular in high-volume websites.
  • Gemstone was one of the object database crowd, and I found the Gemstone-Smalltalk combination a very powerful development environment (superior to most of its successors). Gemstone is still around as a niche player, but may gain more traction through
  • Maglev - a project to bring its approach (essentially a fusion of database and virtual machine) to the Ruby world.

Martin is careful to conclude that RDBMS are not going away and are "the right choice for many situations." His blog does suggest however that given the increase in options these days, "application developers should think about what the right option is for their needs. As non-relational projects grow in popularity and maturity, more and more will go for other options."   What do you think?

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Depends on the situation by Peter Veentjer

I think it really depends on the situation.

For example: If a RDBMS is configured correctly, it is great for doing batch processing. The concurrency mechanisms are well documented in most cases. I don't see the need for a different database mechanism in these cases.

But distributed memory (often ACID with the D) is a lot easier to scale than database. So if you have key/value based searchs, I would hava a look at Terracotta/Coherence and use the database purely as backup mechanism (the D).

jMaglev by ARI ZILKA

Check out jMaglev before Maglev, IMO:

fabiokung.com/2008/11/22/play-with-jmaglev-your...

And what about TRDBMS by Francisco Jose Peredo Noguez

I am really dissapointed to read this, I do not think that RDBMS are going to be used less, on the contrary, someday people are going to realize that we have not even started to use them, that day we will drop SQL and its many flaws and use a really relational language,an industrial D, as proposed in the Third Manifesto

Can non-relation database avoid the problem of Shared Database Integration? by Zhang Joey

Integrating applications directly with common database may break the encapsulation of each application, but I don't think it's a problem of only relational database. Even use the new non-relational database, if adopt database integration architecture, we still need to face the problem that the way one app stores its data may impact other apps integrated.

data semantics by Techno Modus

I think managing data semantics is currently one of the most important issues in data modeling. It is as important as semantics in Semantic Web. Recently I have found new interesting emerging approaches which could solve some problems in data semantics like associative model of data and concept-oriented model: Informal Introduction into the Concept-Oriented Data Model, Informal Introduction into the Concept-Oriented Programming. There is also an interesting paper by Michael Stonebraker on this topic: One Size Fits All: An Idea Whose Time Has Come and Gone

HTTP cannot replace SQL by Frank Silbermann

When another application uses your data, it probably uses it in a manner you did not forsee. It's not necessary to forsee all the ways your data will be used, because a RDBMS provides a flexible query language, SQL. One reason for the failure of object oriented DBMS was the lack of a standard, flexible and efficient ODBMS query language.

We will not be able to replace integration databases with application databases until someone invents and implements an equally flexible and efficient application query language. HTTP that provides access to a small, canned application API will not suffice.

HTTP is not analogous to SQL; rather, it is analogous to the code that implements a networked database driver; the driver would be useless without the ability to run arbitrary SQL commands when the request arrives at the database machine.

Using the RDMS as the integration point results in a star-shaped topology -- with the RDBMS at the center. Too much reliance on web services for application integration can easily result in a spaghetti-shaped topopology.

Re: Depends on the situation by Mark N

For example: If a RDBMS is configured correctly, it is great for doing batch processing. The concurrency mechanisms are well documented in most cases. I don't see the need for a different database mechanism in these cases.
Actually, IMS is probably better for batch processing. Either way, getting rid of or minimizing the batch processing should be the primary objective.

Re: HTTP cannot replace SQL by Mark N

Using the RDBMS at the center creates bottle necks and dependence on db vendors. There are other solutions than web services like transmitting required information to other systems (aka Loosely Coupled systems).

Re: HTTP cannot replace SQL by Francisco Jose Peredo Noguez

And what language will you use to specify what information you want to be transmitted? most likely something based on relational algebra... the relational model is the best tool for the job, but SQL is a bad implementation of it, with lots of flaws, what we need is a D.

Re: Depends on the situation by Francisco Jose Peredo Noguez

And how would you do that? (How would you getting rid of or minimize the batch processing?)

Re: HTTP cannot replace SQL by Francisco Jose Peredo Noguez

A more flexible and efficient query language has alredy been invented, and it is of course also based on the relational model, you can read about it in The Third Manifesto, it is called D and it is what Sql should have been.

I agree with you that HTTP that provides access to a small, canned application API will not suffice, you really understand my point, HTTP is only transport, it has no ability to deal with queries, and you need them to actually manipulate data.

The spaghetti-shaped topopology of WebServices can be avoided with an ESB, but that still does not solve the need to have a relational model based language to manipulate data.

What about XML databases? by Miguel Vitorino

We see more and more data being transmitted over the wire in XML formats (and, yeah, the protocol is mostly HTTP...but we can't query anything with HTTP alone).

Databases like eXist and Mark Logic support both structured and semi-structured data and minimize the typically necessary data transformations (relational <-> object <-> XML/JSON...).
They can store documents, hierarchical data and very strong typed data.
We can more easily support schema versioning.
The query languages (XQuery and XPath) and schema languages (XSD/DTD) are standard.
Replication and clustering are performed more naturally.
They discourage monolithic database designs.
They scale better to a web world because of its native integration with HTTP and greater record granularity...

Any thoughts?

</-></->

XML Databases by Miguel Vitorino

For those of you who may be curious, checkout an instance of the Mark Logic database at markmail.org.

Also, I would like to make clear that I have no affiliation with either eXist or MarkLogic. I'm merely interested in following their progress.

Re: What about XML databases? by Francisco Jose Peredo Noguez

And we can't query anything with XML alone either. You need a program, written in something else. You need a query language, preferably one with strong relational theory supporting it.

Data trasnformations can be built easly as a thing layer that exposes the output of you relationa queries as XML/JSON... I do not see the big deal here...

What is "very" strong typed data? what is the difference with "plain" strong typed data?

Why do they more easily support schema versioning?

And XQuery and XPath are inmmune to Sql flaws?

Why are replication and clustering are performed more naturally? What does XML have to do with it?

Why do they discourage monolithic database designs?

I don't see how their native integration with HTTP and greater record granularity make them scale better... can you explain?

Re: What about XML databases? by Miguel Vitorino

Do we absolutely need a strong relational theory behind a database? Is that the _only_ way to have nice performance? Or do you admit there may be other options? Do you see any relational theory behind Google's BigTable or Amazon's Simple DB?

Schema versioning is easier for the simple reason that schema is not required at all in a XML database. You can choose to use schema, and have several options for that, or you can store raw data and still be able to query that data. Can you do that in a relational database?

Replication and clustering are done more easily because you store related data logically and physically closer. With XML you work at an aggregate level, not at the record/tuple level.

If access to XML databases is designed to be inherently RESTful, it will naturally scale better.
The easiest way to do this, is to embrace the benefits of HTTP - which many of them already do. Of course this will only work with greater record granularity (you don't wanna repeat the same mistakes from CORBA...).

That "thin" layer you refer to only performs format conversions. I believe that is different from data transformations. And no matter how "thin" that layer is, it always has to be there, and usually done by hand, if you have an underlying relational data source that does not quite have the same data representation capabilities your destination formats do.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

15 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT