BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Debate: ODBMS sometimes a better alternative to O/R Mapping?

Debate: ODBMS sometimes a better alternative to O/R Mapping?

This item in japanese

Bookmarks
In a recently released article on ODBMS.org, Ted Neward elaborated upon his idea that Object/Relational Mapping (ORM) is the Vietnam of Computer Science. The main idea that he presents is that Object-oriented Database Management Systems (OODBMS) are better than Relational Database Management Systems (RDBMS) for some applications, for example:
"In situations such as "silo" applications where a single user interface accesses a single database (the traditional "baby webapp on top of a big database"), or the more leading-edge "service" implementations, however, all interaction will be through that user interface or service interface, and never against the database itself, thereby making persistence truly an implementation concern only. In these situations, an OODBMS back-end can be invaluable in defining and preserving a rich domain model, as now there are no entity definitions in two languages (Java/C# and SQL DDL) to be reconciled."
He also identifies the main issue that OODBMS tries to solve as the Dual-schema problem:
"[...] in a traditional object/relational world, two sets of entity definitions are in play: one defined by the programming language itself, the other by the relational model using SQL DDL. This sets up an inherent challenge, as now two sets of definitions must be kept up to date as the system grows and evolves, either by "slaving" one to the other (frequently seen by the use of code generation tactics, either from schema to classes or the other way around), or by editing/adjusting the two separately and hand-tuning the mapping between them as necessary. This creates a tension between the two, and frequently developers are forced to make sacrifices in the purity of both models in order to keep the two in sync with one another.

Again, in an OODBMS, the fact that the class definitions are the only schema present means that no such “dual schema” problem exists; the domain model need not be slaved to the storage definitions, and the storage definitions need not be twisted into strange formations just to support the storage of a rich domain model."

Some people agreed with this viewpoint, such as Andrew McVeigh:
"OO --> storing of complex graphs, fast navigation / traversal between objects, low impedance mismatch.

RDBMS --> data independence, suitable for complex reports, much better schema management (DDL).

Using an RDBMS for storage for a CAD system's diagrams, or an OODB for a reporting database is just asking for trouble."

However, there were also those who disagreed with this viewpoint, notably Gavin King, who wrote an article defending the role of the RDBMS. He brings up several points, a brief summary of them being:
  • ORM is required for legacy data - ORM is the only way that you can handle existing schemas or support legacy data, so replacing it isn't an option.
  • ORM can handle the DB for you - ORM solutions can generate the mappings and database schema for you if you don't have any backwards compatibility requirements.
  • Data lasts longer than applications - Mapping is needed because data will almost always last longer than the application that created it.
  • OODBMS is bad for compatibility - Because you store strongly-typed objects in the database, OODBMS is difficult to use with multiple development languages, whereas simple strings and numbers in an RDBMS can be mapped by each language.
  • OODBMS is not mature enough - OODBMS isn't seen in major data management systems because it's very immature compared to most RDBMS systems
  • Benchmarks showing OODBMS as being faster are flawed - OODBMS systems will normally run either in the same process as the app, or are written in an unscalable way - as a result, they do well in the small case, but are no good in the large case. As well, ORMs are slower because they are more robust - once OODBMSes incorporate a robust, mature featureset they will be the same speed
In particular, Gavin states:
"To be clear, using ORM technology introduces no new no "mapping" or "dual schema" problem unless one already exists, due to the requirement of access to legacy data. If you just want to "throw some objects in the database", you'll never need to write a single mapping annotation. So, from this point of view, ORM is at least as good as an object database for all usecases, and handles other usecases (indeed, the common cases) which the object database approach does not."
Gavin also said:
If you think that relational technology is for persisting the state of your application, you've missed the point. The value of the relational model is that it's democratic. Anyone's favorite programming language can understand sets of tuples of primitive values. Relational databases are an integration technology, not just a persistence technology. And integration is important. That's why we are stuck with them.
This resulted in a long follow-up by Ted Neward:
"since when does one tool solve all problems? They have their own raisons d'etre, and to simply say that the OODBMS or HODBMS should be ignored just because "we've always used an RDBMS" is a crime just as great."
Ted also disputes many of Gavin's points, including the following response to Gavin's assertion about the Dual-schema problem:
"Sorry, Gavin, but the fact is, this remains, and always will remain, a point of difference between you and I, and between you and a fairly large number of developers I've spoken to over the years at conferences and consulting engagements and classes. For simple table-to-class mappings, you're right, it's a pretty simple thing. It is, however, still a "dual schema" problem, in that now you have two competing "sources of truth" that have to be reconciled to one another, the database schema, and the object model. Now, perhaps if all the projects you've ever done are projects where the developer gets to define both, then the problem doesn't appear, but if you're in an "enterprise" world where the database schema is managed by a team of DBAs and is shared across projects, you don't have the flexibility to "refactor" the schema like you can your object model."
The debate appears to be only getting started - care to weigh in with your own opinion?

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • OODBMS

    by Arnon Rotem-Gal-Oz,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think OODBMS can be a good option when the data requirements are modest.
    I've successfully used db4o in a couple of project and there are some other nice OODBMSs our there. However I wouldn't use them for larger project, esp. if the project is data intensive. I also talk about that in a paper I wrote on OR/Mapping when/why

    Arnon

  • Re: OODBMS

    by Stefan Edlich,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    > And what about the case in which data is shared between
    > applications having slighty different object models?
    > Are OODBMS able to handle those kind of case?
    Well this depends on the vendor. Some can better, some can not.

    For example, db4o can handle different classes with different names
    and partially different data if you use an alias. This allows e.g. the
    usage of Java and C# classes in different apps but on the same database.
    (That can run under MS or Java).

    By the way: This is a nice extraction of the "meta discussion" between these groups as Gavin King and Ted Neward. And a perfect forum for the discussions
    of the "when and why" and "the right tool for the right task" might be the
    Object Database Conference that will be held 2008 in Berlin with my support:
    www.icoodb.org

    Try to contact or join!

    Best
    Stefan Edlich

  • Re: OODBMS

    by David Skelly,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    And what about the case in which data is shared between applications having slighty different object models? Are OODBMS able to handle those kind of case?


    No, because that's not how OODBMSs work. In the OODBMS world, the object model IS the data. There is no distinction between data in the database and objects in memory, no translation, no mapping. So the object model that you load back in from the database is the same as the one you store in the database. If your object model changes, there are various ways to update the data in the database, in the same way that you can do things like ALTER TABLE with SQL (the exact mechanism depends on the OODBMS in question). But two applications cannot access the same data via a different object model because in the OODBMS world that doesn't make sense.

    Ted isn't saying that you should always replace an RDBMS with an OODBMS, or that RDBMSs aren't useful or efficient or flexible or extensible or robust or dependable. He's saying that sometimes, for some applications it's preferrable to use an OODBMS because it makes things easier.

    I don't know why people get so defensive about this. If an OODBMS is not suitable for your project, don't use it. It's as simple as that. No one is saying that RDBMS skills will become redundant, because we all know that won't happen.

    For the record I have nearly two decades of RDBMS experience, and also about four years working on a system which used ObjectStore and my conclusion is that an OODBMS can be a god-send if you use it for the right thing, but can be a nightmare if you try to use it where an RDBMS is a better fit.

  • Re: OODBMS

    by David Skelly,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Apologies if my post came across as aggressive or critical at all. That wasn't how it was intended, although reading it back now I can see that it might be read that way.

  • Ted's argument falls flat

    by Jason Carreira,

    Your message is awaiting moderation. Thank you for participating in the discussion.


    Now, perhaps if all the projects you've ever done are projects where the developer gets to define both, then the problem doesn't appear, but if you're in an "enterprise" world where the database schema is managed by a team of DBAs and is shared across projects, you don't have the flexibility to "refactor" the schema like you can your object model.


    This argument of Ted's is spurious. Show me an environment like that where you can get them to ok an OODBMS. You think DBA's and operations teams dislike developer-built RDBMS's? How are they going to feel about a new technology that they're unfamiliar with and have no tools for.

  • hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I find myself violently disagreeing with myself, or at least the way I've been quoted ;-)

    I don't believe that the correct way of looking at an OODBMS is that it minimises the need to keep an object schema and an SQL schema. To me this is spurious thinking.

    The key (IMO) to using an OODBMS correctly relative to an RDBMS is to note that they have very different (performance) characteristics.

    1. OODBs have a very close match to single link OO navigation, a good match to OO business logic. they are optimised for that.
    2. OODBs are very good with complex graphs -- like the ones required by CAD tools etc.
    3. OODBs are very poor at data independence. Use an RDBMS for any business data.

    Areas in which I've used an OODB very successfully are with a CAD tool, and also in telecomms where there were millions of customer records with some complex graphs attached. Also, for batching and queuing messages spooled straight from a financial exchange where realtime operation was paramount.

    As to Gavin's claim about performance, I respectfully disagree that an RDBMS is always faster all other things being equal. The key is that OODB's excel at single link navigation, and going against the grain of these links will produce terrible performance. If you keep to this rule then, and don't require set based operations, performance in an advanced OODB is generally quite suprisingly spectacular over many millions of records.

    It is correct however in that OODBs are very much more immature than their RDBMS counterparts. In addition, the tight coupling of the object model and the schema actually produces problems in an enterprise when adding fields and morphing a schema...

    As for the use of embedded OODBs, most OODBs like Versant and others are client-server. Even "small" OODBs like objectdb offer a superb client server mode supporting thousands of transactions a second on modest hardware. I've never used an OODB in embedded mode except for the simplest of apps.

    Andrew

  • Re: hmm

    by Gavin King,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    As to Gavin's claim about performance, I respectfully disagree that an RDBMS is always faster all other things being equal.


    I most certainly never made that claim, or anything remotely like it. All I did was point out that there are many complex variables affecting performance, and, a priori, the dominant variable is *not* whether the underlying conceptual model is relational or object-oriented.

    That's not to say that any particular OODBMS is not faster than some particular RDBMS for some particular task.

    It is not even to say that, with their emphasis upon particular kinds of tasks, existing OODBMSs in general are not faster than existing RDBMSs in general. The usecase of CAD tools that you quote is a good example of where existing OODBMSs have a strong featureset, and existing RDBMSs are weak.

    All I'm saying is that this is not a function of the conceptual model, but rather of the implementation. There is no reason why you could not build an RDBMS that optimized navigation of hierarchical graphs of data.

    So, if we want to do better, as an industry, at optimizing usecases like CAD tools (and other usecases for which people have proposed OODBMSs as a panacea), which is the faster, easier, cheaper and more practical approach:


    (1) spend hundreds of millions of dollars re-educating developers and data management professionals on OODBMs technology and throw away decades of hard-won experience with relational technology just to get optimized CAD tools (and in the process lose all the wonderful data integrity, ad hoc querying, and interoperability features of RDBMS).

    OR:

    (2) Add some features for hierarchical graph navigation to Oracle.


    To be fair, the current interlocutors (ie. the employees of db4o) are not proposing a wholesale migration to OODBMS (though some others do), but even a partial migration carries many costs and inefficiencies due to the exchange of one data management technology to two. Can you imagine the pain and suffering in the inevitable need to occasionally migrate data between your OODBMS and your RDBMS?

    Indeed, I think many of these arguments about "some applications" and "for some kinds of data" are really missing the point that, yes, *today*, your little Java application is the only one that needs the data ... but *tomorrow*, who knows?

  • Re: hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    As to Gavin's claim about performance, I respectfully disagree that an RDBMS is always faster all other things being equal.


    I most certainly never made that claim, or anything remotely like it. All I did was point out that there are many complex variables affecting performance, and, a priori, the dominant variable is *not* whether the underlying conceptual model is relational or object-oriented.


    yes, performance evaluation and estimation is complex. however, the 2 models of db are very different under the hood resulting in different performance characteristics. Integrating link navigation facilities of an OO database into a relational, set-based model is apparently difficult and carries tradeoffs. I haven't looked at the technical side for over 5 years, and I've never implemented either product fully myself, so bear with me, but OO dbs (under the covers) have effecively a pointer concept -- the ODMG used to call them "swizzled pointers". Under the covers of a RDBMS are set structures and indexes.

    Merging the 2 concepts used to be popular years ago. Versant's attempt to build a SQL layer failed dismally in my opinion. The other way around, Oracle and products like PostgesQL have added OO-like features where they don't compromise their underlying model. This so far has included nested tables and the like. Not direct links, which assume a pointer structure and carry an associated maintenance cost as items are added etc.

    Everything added to a model has a conceptual and practical cost. Indeed one of the complaints levied at OODBMs' was that they had no rigorous conceptual bases, unlike RDBMS' which were based on relational algebra.

    The latest db product I've architected with was KDB+, which is a APL-based vector database capable of storing billions of rows of time series data. It uses a table-like model and also has SQL-like facilities. Is it worth adding full SQL when it might double the storage? definitely not when you are talking about tb's of financial / market data... the db is heavily optimised for the domain. Could we use a SQL database for the same thing? possibly, but I don't want to make the cover of the SQL Server or Oracle magazine on how I'm pushing the relational envelope. Horses for courses. No one size fits all...

    (To be honest, the last thing I want to be portrayed as is the defender of OO databases. I've not recommended one in a business setting for over 7 years, and the last big project that I inherited with Versant, I got them to remove it as they planned to put business data in it. Don't even get me started on the limitations of db4o. I have no affiliation with any db vendor)

    However, for some situations OODBs and their associated performance characteristics are a big win. I won't bore you with the details of why, in a CAD situation they are better, but suffice to say there are reasons related to the depth of linking in the diagrams and the granularity of (often lazy) access to nested structures that make an OODBMS a big win here... In addition, the server can follow links to an arbitrary depth across "tables" to resolve complex graphs. If you truly have worked with such structures and have a solution, then i'd be interested in how you map it efficiently onto an RDBMS structure as I can only imagine it working with simple diagrams...

    Andrew

  • Re: hmm

    by Thomas Meeks,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    OODBMS's are no panacea, but neither are RDBMS's. While there are some very solid reasons for not using an OODBMS in certain situations, like data warehousing, every argument I have seen that argues against the use of OODBMS's in any situation boils down to one sentence:

    "OODBMS's are not mature enough"

    There is nothing that precludes and OODBMS from having data integrity, ad hoc querying (couldn't they just support HQL?), or interoperability.

    Furthermore, while it is true that another application may need to get into the database at some point in the future (which I agree, is an important thing to have), what would prevent them from using it in the same manner as the java application? At least assuming a driver exists for the language, which it would -- if the OODBMS were mature.

    Hesitation to use something off the beaten track is understandable, but that does not mean the entire concept is invalid.

    Sure, an OODBMS is not a mature (in terms of widespread usage), battle-tested concept, but neither was a RDBMS at one point (or, more recently, ORM). It is good to see some more practical effort going into the field -- it will only help expand and improve the tools we have available.

  • Re: hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.


    Well my concern is not about the platform but about the OO model. I have never seen two applications use the exact same domain design (or seen any domain based libraries succeed). After all, a design is just an approximation of the world out there based on the application needs. Therefore I always though it was quite normal to have some kind of mapping (as light as possible) between the data storage model and the domain model in case where the data is or may be shared between 2 applications.


    In my experience, OODBMs and database sharing are not a good match for a family of applications for precisely the reasons you mentioned. RDBMS as a more "data independent" model is far more appropriate.

    Andrew

  • Re: hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    To see the pros and cons of an OODBMS approach, you really need to look into the actual database model. It's not really anything to do with maturity. OODBs have a model very similar to java objects, where links are directly encoded and they are (almost always) one way. i.e. if you have a reference in class A to class B, then you can navigate from A to B but vice versa is very difficult.

    so, what makes an OODBMS fast also makes it tied to the application logic that the links encode... That's why adding ad-hoc queries is so difficult.

    Interoperability and data independence is limited by the navigation paths you have encoded.

    Andrew

  • Re: hmm

    by Thomas Meeks,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Well my concern is not about the platform but about the OO model. I have never seen two applications use the exact same domain design (or seen any domain based libraries succeed). After all, a design is just an approximation of the world out there based on the application needs. Therefore I always though it was quite normal to have some kind of mapping (as light as possible) between the data storage model and the domain model in case where the data is or may be shared between 2 applications.


    You are right, it is rare for two applications to have exactly the same data model. But if you are looking at letting two applications share the same database, chances are, they are both custom applications to which you have the code.

    At least I have yet to see a commercial black-box application that maps itself to whatever legacy database you might have. I suppose it is possible, however.

    So, with that in mind, why do they need to have completely different data models? A person is a person right? You shouldn't need to represent them with a completely different object just like you don't need to have two separate persons tables for each application. When you code the second application, you might add to the data model, just like you might to a SQL schema. An OODBMS could provide field-level visibility for the applications to ensure each application only gets what it is concerned with.

    An OODBMS can still conceptually provide views, security, stored procedures, and all those other creature comforts we are used to when integrating with a RDBMS.

    Like I said, there are clear cases where an OODBMS isn't useful, mostly having to do with heavy set operations and the like. I'm not saying they are everything to everyone, but I'm still looking for that un-fixable conceptual problem that makes them so completely worthless as a number of people seem to think.

  • Re: hmm

    by Thomas Meeks,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    To see the pros and cons of an OODBMS approach, you really need to look into the actual database model. It's not really anything to do with maturity. OODBs have a model very similar to java objects, where links are directly encoded and they are (almost always) one way. i.e. if you have a reference in class A to class B, then you can navigate from A to B but vice versa is very difficult.


    But must OODBMS's have a java-like model? Is having one-way associations a requirement? If you need to go back and forth, what's stopping you from putting in that other link in the form of a collection or map?

    There might be performance problems with that, but such a thing is dependent on the implementation. It shouldn't need to have any more overhead than a hibernate many-to-one relationship.

    so, what makes an OODBMS fast also makes it tied to the application logic that the links encode... That's why adding ad-hoc queries is so difficult.

    Interoperability and data independence is limited by the navigation paths you have encoded.


    I see where you are coming from -- but it doesn't have to be that way. You could program in the link where it is needed. Also, there is really nothing stopping an OODBMS from having an arbitrary join functionality in the same manner as SQL. It may come with its own performance quirks, but then again I'm also not saying an OODBMS is useful everywhere, in every situation.

  • Re: hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    But must OODBMS's have a java-like model? Is having one-way associations a requirement? If you need to go back and forth, what's stopping you from putting in that other link in the form of a collection or map?

    There might be performance problems with that, but such a thing is dependent on the implementation. It shouldn't need to have any more overhead than a hibernate many-to-one relationship.


    that's correct, you can put in bidirectional links and manage them just like you can in an OO language. Versant has "bilinks" for instance, which are unidirectional links which are managed both ways. However, it's *very* painful to put them in everywhere and you lose any potential layering of your data model. In practice noone does it.

    There is a bigger problem though than just layering, also. Particularly when there are a lot of things in a collection. If you did maintain this both ways, you get very quickly to a situation where the incremental addition of one element in a collection means that you must read in and add to a very large collection. Very unscalable even for small-ish data sets. In this sense the relational model of having a foreign key, and being able to join in both directions gives you that nice quality of data being independent from business logic and navigation.

    So, it's not the data it's the fact that links in your model have encoded a particular "business logic". i.e. the transactional app takes a trade and associates it with an account (trade to account). Someone then wants a gui that can look at all trades in an account (account to trades) and you are navigating against the natural link direction. Easy to fix for this example, but in practice in a large schema something will trip you up. And don't even get me started on adhoc reporting :-)

    Cheers,
    Andrew

  • Re: hmm

    by Gavin King,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    yes, performance evaluation and estimation is complex. however, the 2 models of db are very different under the hood resulting in different performance characteristics. ... I haven't looked at the technical side for over 5 years, and I've never implemented either product fully myself, so bear with me, but OO dbs (under the covers) have effecively a pointer concept -- the ODMG used to call them "swizzled pointers". Under the covers of a RDBMS are set structures and indexes.


    Oh c'mon, that's silly. Every implementation of object references boils down to a "primary key" under the covers of a simple programming-level abstraction. Even Java layers object references over some kind of memory address.

    I'm quite sure that once you check the actual underlying implementation of your magical "pointers" in an OODBMS, you'll find some kind of numerical key value. (That's what "swizzle" means, btw.) Not at all different to how ORMs layer object references over primary key in the RDBMS.

    Integrating link navigation facilities of an OO database into a relational, set-based model is apparently difficult and carries tradeoffs.


    Well, yes, it is somewhat difficult and there are indeed some tradeoffs. But I've done it, it works well, it is efficient for 98% of usecases and it has been adopted by probably > 75% of Java projects.

    You can find it here: hibernate.org

    It's called ORM. It's a *way* more successful technology than OODBMs. It's not perfect by any means, but it solves the problem for most people.

  • Good Free Candidate

    by Geoffrey Wiseman,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Personally, I think the role of OODBMS has been hampered by the fact that there isn't a good, clearly-recognized, free (as in beer), commercial-friendly-licensed (Apache, BSD, etc.) OODBMS. There are a number that match one or more of those criteria, but none that I'm aware of that gets the whole set.

  • Re: hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.


    Oh c'mon, that's silly. Every implementation of object references boils down to a "primary key" under the covers of a simple programming-level abstraction. Even Java layers object references over some kind of memory address.


    Umm, no, it's not silly. In an object database these are literally "pointers", not some a set of numbers that must be further resolved into a set of addresses through a filter/join. (Of course, implementations differ. Objectstore was closest to this concept but was limited to the size of the virtual address space). Have a look under the covers of how these things work in the products, not just in the mapping layers. Single link navigation and set-based join are very different primitive operations with different performance characteristics and a different conceptual basis. They are much closer to the old hierarchical databases which preceded relational.

    (Carl, Ilan, any other OODB implementers out there who care to comment further on the implementation side?)


    It's called ORM. It's a *way* more successful technology than OODBMs. It's not perfect by any means, but it solves the problem for most people.


    Gavin, I'm not denying the usefulness of ORM. In fact, I used to consider myself as a Toplink expert, having introduced it on several successful large projects from '99 to '04 (I gave up when per-CPU licensing fees were introduced). From what I've seen, Hibernate is Toplink done better/properly, not something new. Having said that, I'm very grateful that you have created Hibernate. Toplink is way too proprietary and was very expensive... Now the ORM layer is now effectively a commodity.

    It works well for most "business" situations, although despite a cache and support for object identity, ORM still retains many of the essential performance characteristics of the relational store (which is often an advantage for reporting etc). Further, caching is often problematic for horizontal scaling in large enterprise apps, requiring classification of different data types and expiry times. (as an aside, in all the object dbs I've used, they've never had a need for a cache for performance reasons).

    Anyway, it's very easy to become monomaniacal about these things. Horses for course. Seriously, if you haven't already get a good OODB and use it in anger.

    Andrew

  • Re: Good Free Candidate

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Personally, I think the role of OODBMS has been hampered by the fact that there isn't a good, clearly-recognized, free (as in beer), commercial-friendly-licensed (Apache, BSD, etc.) OODBMS. There are a number that match one or more of those criteria, but none that I'm aware of that gets the whole set.


    I think this is definitely true. Tee lack of a good free client-server object database with support for transparent transitive persistence has held the technology back. I think this was also a true statement for smalltalk, which never had a freely available commercial quality implementation so it could become widely used...

    Andrew

  • Re: hmm

    by Gavin King,

    Your message is awaiting moderation. Thank you for participating in the discussion.


    Oh c'mon, that's silly. Every implementation of object references boils down to a "primary key" under the covers of a simple programming-level abstraction. Even Java layers object references over some kind of memory address.


    Umm, no, it's not silly. In an object database these are literally "pointers", not some a set of numbers that must be further resolved into a set of addresses through a filter/join. .... Have a look under the covers of how these things work in the products, not just in the mapping layers. Single link navigation and set-based join are very different primitive operations with different performance characteristics and a different conceptual basis. They are much closer to the old hierarchical databases which preceded relational.


    OK, I may be being dense here, but I'm just not seeing it.



    • An object database stores a to-one association as a "pointer", by which I understand you to mean an address of a disk location. (I'm skeptical that there is not some additional indirection there, but I've not implemented an object database so I'm not sure of that.)


    • A relational database stores a to-one association as a primary key values which is resolved to an address of a disk location via an intermediate (efficient, pure in-memory) index lookup.




    So this index lookup is what is making the relational database so much slower?

    I don't get it. I've simply never met an application where performance was bounded by the cost of index lookups used for resolving to-one associations. Every application I ever met was bounded by the cost of server round-trips.

    Further, caching is often problematic for horizontal scaling in large enterprise apps, requiring classification of different data types and expiry times. (as an aside, in all the object dbs I've used, they've never had a need for a cache for performance reasons).


    Now you've really, really lost me. Why on earth should the caching be a different problem? I have a server, with data on disk. And a remote client, which needs that data. Ergo I need a cache. Different types of data in my system are accessed in different ways, with different consistency requirements. Ergo I need different expiry policies.

    I did not mention the words "object database" or "relational database" in the above paragraph.

  • Re: hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.


    So this index lookup is what is making the relational database so much slower?


    (I'm not an OODBMS implementation expert so I can't give you definitive answers, particularly about the extra level of indirection. If it is important we can take this offline, and I can get some of my mates who work on these products to comment)

    My understanding is that the difference comes in situations like the one where the RDBMS back end can't keep all of the relational indices in memory. It has to make a choice. Also, since the primary way of moving from one table to another is via relational join, which is inherently set based, then it gets slower as the indices get larger as you have more to intersect. They have clever algorithms, index types (bitmap etc) and policies but the principle is there.

    E.g. for a recent risk project I worked on, having 25mill risk vectors meant a query with 12 tables to get one result which took 90s in this case. For getting 1000 results back, the time was only double... Navigating a single link isn't a particularly effective use of the set paradigm.

    For OODBMS', initial lookup is still slow (often slower than an RDBMS) as the indices (you still have them) get larger, but once you are past that, you are literally navigating directly from disk location to location with no need for set intersection etc.


    I don't get it. I've simply never met an application where performance was bounded by the
    cost of index lookups used for resolving to-one associations. Every application I ever met was bounded by the cost of server round-trips.


    So, I've seen both, but as you say the roundtrips are usually the bottleneck in my experience in most domains. However this is getting to the good stuff -- if the model is very granular, and there is a lot of it (i.e. fine grained object link resolution) then an OODBMS can be a good choice due to the more efficient link navigation. I must caveat this by saying that it's not often you have a model like this. The only real time I have is in a CAD system with lots of diagrams. You wouldn't want to design like this unless you had to, although this is a trap that people using an OODBMS usually fall into regardless of the domain.

    Back to roundtrips -- in a client server OODBMS with lazy resolution of links, it will often do a whole lot of object navigation and link resolution on the server side in response to one link traversal on the client side. So because the server side understands the way objects are linked, it can resolve a graph to depth 3, say, and silently cache the rest (in the client) for the transaction. This resolution can happen over many classes/tables. This is often the case when you are navigating through a complex graph, where it will pull back more than is strictly needed to minimise roundtrips.


    Now you've really, really lost me. Why on earth should the caching be a different problem? I have a server, with data on disk. And a remote client, which needs that data. Ergo I need a cache. Different types of data in my system are accessed in different ways, with different consistency requirements. Ergo I need different expiry policies.


    I mean a client-side cache which keeps objects between transactions...

    In any toplink system I've used, they have a client-side cache to retain object identity (i.e. pointer to primary key) and to speed things up. As you scale out the mid tier you end up with dozens of client-side caches, which need to be kept in synch through policies. i.e. if you are updating a frequently used account balance, you can't cache it on the client side between transactions. A good client-server OODBMS only keeps the cache for a transaction retaining only references between transactions. When you commit, the contents of the local transaction cache will be cleared, meaning that it will go back to the db for anything in another transaction. Because of the way that roundtrips are handled as explained above, it is usually very quick.

    In fact, I hadn't even understood the need for a client side cache before using an ORM library back in '99. Before that I'd only used object databases. Indeed, one of the severe flaws of toplink (still?) is the poor control you have over the client side cache. I believe that Hibernate is vastly superior here allowing different policies for different types of data.

    So, consider using an ODBMS for my CAD system. (Actually, it's a complex UML2 case tool with multi-user facilities I'm finishing off for my phd). The diagram is represented as a set of node and arc objects, and each then links to an element of the model (257 different classes), which are further linked to other bits and so on in a complex web (have a look at the UML2 metamodel if you want a fright!). Drawing a diagram involves drawing all the nodes and arcs from a diagram, and then traversing to an arbitrary depth to find the details for the name, the colour, whether the element is active etc. This is expressed in java code. For a good OODBMS, drawing a screen of say 1000 elements will only involve a few roundtrips, because the server side understands the graph and can bring more in without you explicitly asking for it.

    (You could easily implement this sort of stuff in a server process using hibernate as the back end, but you'd essentially be building an object db)

    Also, as I mentioned earlier, I fully agree with you that much business data belongs in a relational db. Suffice to say, the retail bank where I work has most of its data on a mainframe in a hierarchical db :-) Very old school, and is blisteringly fast, but has the same problem with lack of data independence that an ODBMS has.

    Cheers,
    Andrew

  • Re: hmm

    by Gavin King,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    You're now deep in implementation details. So I'll return to my original argument: nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model; what you are describing is how (some) existing systems happen to implement that conceptual model. A priori, I don't see any reason why a primarily relational database could not make the optimizations you are describing. Likewise, I don't see why a primarily object-oriented system couldn't provide great reporting and data integrity, just like a relational database.

    It's not about "sets" vs. "pointers". Those are just abstractions that exist at the API level and at the conceptual level.

    Of course, it turns out that most relational systems that exist today happen to not implement optimizations for operations on the kind of hierarchical data that crops up in a number of niche usecases (CAD being the example that is often given). I think that's primarily because those usecases are rare in business applications. Which is not to say they're unimportant.

    Most importantly, none of this has to do with any "dual schema" or "paradigm mismatch" problem.

  • Re: hmm

    by Thomas Meeks,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Yeah but an person doesn't mean the exact same thing to both application even though they use the same date. In application #1, the person may have an association to a department and 5 or 6 subclasses while in the other application, this association to a department has no meaning and shouldn't exist (even though the data is present) and doesn't support any person subclass.

    If it's not a case an OODMS can handle then I think morst applications should stay away from this technology because nobody can't predicate the enterprise requirements. Programming for phantom requirements may be a bad habit but your architecture has to be able to scale up or down.


    It is a case an OODBMS could handle. Try thinking about an OODBMS in more generic terms, storing objects in a custom binary format, acting as a server, working through drivers in much the same way a RDBMS does.

    There is nothing that technically limits an OODBMS to the same limitations of, for example, Java. That in mind, it is quite possible for an OODBMS to provide a view to each application that satisfies every concern you just stated.

    Are there any implementations like it? Don't know. But until there is a widespread standard (i.e. the equivalent to SQL for OODBMS), we shouldn't write off what an OODBMS can and cannot do.

  • Re: hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    You're now deep in implementation details. So I'll return to my original argument: nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model; what you are describing is how (some) existing systems happen to implement that conceptual model. A priori, I don't see any reason why a primarily relational database could not make the optimizations you are describing. Likewise, I don't see why a primarily object-oriented system couldn't provide great reporting and data integrity, just like a relational database.

    It's not about "sets" vs. "pointers". Those are just abstractions that exist at the API level and at the conceptual level.


    I disagree with the statement that "nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model". Look at the maths in Codd's paper if you haven't had a chance to already:

    www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
    (it was a significant enough shift to warrant the Turing award for the guy in '81)

    The relational model has a conceptual foundation in set theory in maths. Object dbs do not. This conceptual link is what gives the relational model its power. To add in hierarchy, it needs to be done at this level, where sets are the main construct. How do you get a link? It's a set intersection. Hierarchy is subordinate to sets, and essentially missing from the base concepts in the model.

    The conceptual level is vitally important when merging 2 different paradigms (let's ignore implementation details, they are unimportant). I'm not saying it can't be done, and some future paradigm will no doubt do just this. I've not seen any practical product do it successfully (OODB or relational), and I've seen lots of academic papers that attempt to address the area but have failed.


    Of course, it turns out that most relational systems that exist today happen to not implement optimizations for operations on the kind of hierarchical data that crops up in a number of niche usecases (CAD being the example that is often given). I think that's primarily because those usecases are rare in business applications. Which is not to say they're unimportant.


    I've not thought about it in detail for around 5 years, but I think you're right. Hierarchical concepts (and their tie in to understanding OO linking patterns) are important and could be introduced into a relational system or both concepts merged somehow. However, I think history has shown that hierarchy is far less important for business apps, than a set-based view, and ORM/Hibernate does a nice job of adding almost all of the missing pieces. The fact that a relational view of data is independent of app logic has won out. RDBMS' have proved their superiority over ODBMS' for this reason, and the advantage is so compelling that RDBMS vendors haven't needed to concentrate too much in this area. I'd say in that regard, ORM is "paying off the object/hierarchical debt of the set paradigm".


    Most importantly, none of this has to do with any "dual schema" or "paradigm mismatch" problem.


    I seriously agree with you about the dual schema mismatch problem, in the sense that I don't think it is really a big advantage of OODBMS systems. It's convenient, but it's not the real reason to choose an OODB over an RDBMS.

    I disagree about the paradigm mismatch point. It's very real, and has real consequences in that certain use cases will be a better fit to the paradigm of OODB or RDBMS, more or less independent of how they are implemented. "sets versus hierarchy" gets to the heart of the difference if you trace it back to concepts, and this difference has turned out to have huge ramifications.

    Andrew

  • Re: hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.


    There is nothing that technically limits an OODBMS to the same limitations of, for example, Java. That in mind, it is quite possible for an OODBMS to provide a view to each application that satisfies every concern you just stated.


    Yes, a db system could be created that does this, but it wouldn't be an "object database" in the conventional sense. The driver for OODBMs was literally just that -- a direct correlation with an object view of the world with a literal object mapping. That literal mapping has real consequences which don't work well for long lived data.


    Are there any implementations like it? Don't know. But until there is a widespread standard (i.e. the equivalent to SQL for OODBMS), we shouldn't write off what an OODBMS can and cannot do.


    We could all start again, but the ODMG did publish standards in thie area. It was the database equivalent of the OMG (which defines UML). However, it's all but died out now. The whole area could be revisited again, but would need to do so with an understanding of what limited the previous generation.

    Andrew

  • Re: hmm

    by Gavin King,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I disagree with the statement that "nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model". Look at the maths in Codd's paper if you haven't had a chance to already:

    www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
    (it was a significant enough shift to warrant the Turing award for the guy in '81)

    The relational model has a conceptual foundation in set theory in maths. Object dbs do not. This conceptual link is what gives the relational model its power. To add in hierarchy, it needs to be done at this level, where sets are the main construct. How do you get a link? It's a set intersection. Hierarchy is subordinate to sets, and essentially missing from the base concepts in the model.


    OK, here's where I'm checking out of this discussion. My eyes glaze over the minute people start talking about the supposed magical set theoretical foundations of relational databases.

    Relational databases have almost nothing to do with what mathematicians call set theory - which is primarily concerned with the study of transfinite numbers (a la Cantor), or with reducing mathematics to primitive axioms (RW, ZF, etc).

    Relational databases bear the same relationship to set theory that arithmetic bears to number theory: ie. the most trivial, pedestrian application of the most basic definitions.

    I understand that most computing professionals have not studied set theory and are easily intimidated by the invocation of this term, but my major was pure mathematics, and I'm not intimidated, indeed I'm well aware that set theory has almost nothing to do with the practice of data management.

    And yes, I've heard of Codd before. And yes, I understand that linking to a Codd paper might make you seem knowledgeable, but it really doesn't advance the discussion at hand.

    Cheers, Gavin.

  • REST for Object-Relational mapping

    by Benjamin Carlyle,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Objects see databases as memento and object-graph storage. Databases see objects as data exposed in table rows. RDF databases see objects data exposed in schema-constrained graphs. The private of one is the public of the other. The benefits of each conflict with the design goals of the other.

    Perhaps REST is the middle ground that everyone can agree on. Objects interface easily using REST. They simply structure their mementos as standard document types. Now their state can easily be stored and retrieved. Databases interface easily using REST. They just map data to data. So the data in an object and the data in a database don't necessarily have precisely-matched schemas. They just map to the same set of document types and these document types define the O-R mapping. The document type pool can evolve over time based on Web and REST principles, meaning that tugs from one side of the interface don't necessarily pull the other side in exactly the same direction.

    If O-R mapping is the Vietnam of computer science, perhaps we should stop mapping between our object and our relational components. Perhaps we should start interfacing between them, instead.

    Benjamin.

  • Re: hmm

    by andrew mcveigh,

    Your message is awaiting moderation. Thank you for participating in the discussion.


    OK, here's where I'm checking out of this discussion. My eyes glaze over the minute people start talking about the supposed magical set theoretical foundations of relational databases.

    Relational databases have almost nothing to do with what mathematicians call set theory - which is primarily concerned with the study of transfinite numbers (a la Cantor), or with reducing mathematics to primitive axioms (RW, ZF, etc).

    Relational databases bear the same relationship to set theory that arithmetic bears to number theory: ie. the most trivial, pedestrian application of the most basic definitions.


    the paper I quoted has only the most trivial examples of set theory in it, and accessible to anyone with a comp-sci background. (It is comp-sci sets) Relational join is set intersection. However, I argue that the conceptual foundation led to direct implementations and a revolution in data storage. It's one of the most successful examples of conceptual preceding implementation. The connection is not trivial (and most purists argue that the modern RDBMS' are not truly relational), but is fundamental and is generally acknowledged.


    I understand that most computing professionals have not studied set theory and are easily intimidated by the invocation of this term, but my major was pure mathematics, and I'm not intimidated, indeed I'm well aware that set theory has almost nothing to do with the practice of data management.


    I didn't quote the paper to show that I am smart, to intimidate you, to condescend to you, or to grandstand. I only have a rudimentary maths background. I did it to argue my point that relational dbs are in fact successful largely because they have a conceptual basis which has real power. They are based around Codd's 12 rules, which came directly from the theory. Object dbs aren't, and have been roundly criticised for this.

    And yes, given the world that I am from, if the first sight of a seminal (but accessible) academic paper makes you "run for the hills" then we are going to have to respectfully disagree on the point about conceptual importance. I'm checking out of the discussion also.

    Cheers,
    Andrew

  • Gavin's World

    by Christof Wittig,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    The biggest problem with Gavin's position is that he assumes everybody sees the world like him. Apparently, though, there are 10,000s of Java and .NET developers who see that limiting one's options is usually not the best strategy to improve the outcome.

    The main flaws in the ORM vendors' arguments are:

    spend hundreds of millions of dollars re-educating developers and data management professionals


    They assume that the learning curve for ODBMS/db4o is as steep as for ORM/Hibernate - in fact, it's not. 10 APIs and zero modelling requirements get you up and running in 5 minutes. It just costs $10 to train you in db4o, not $1,000,000 like for Hibernate and Oracle.

    As for DBAs: Second generation ODBMS are positioned as embedded, zero-admin database. In fact you SAVE all the money for these expensive gatekeepers to your data and put the developers back in charge.

    Crazy? Well: Think of cellphones, packaged software, SCADA systems and SOA - the data is invisible to the end user and only accessible to developers. It's not only a CAD tool, it's 10-20% of the market and with more mature technology it will be 50% of object persistence that is prone to use the more cost-efficient ODBMS technology.

    Why pay a DBA -- other than in the data center of GE?

    [ODBMS lack] ad hoc querying


    wrong.

    db4o provides Native Java/.NET queries for Ad hoc querying.


    even a partial migration carries many costs and inefficiencies due to the exchange of one data management technology to two. Can you imagine the pain and suffering in the inevitable need to occasionally migrate data between your OODBMS and your RDBMS?


    In an embedded environment there is no legacy and no migration. And we don't change from one system to two, but from dual schema to one - the object schema which you already use in any case.


    Other arguments from this thread:

    * ODBMS technology is immature

    It sure was and is sure not as mature as 30 years old Oracle yet. With the same argument you should probably use hosts. They are even more mature.

    We invite you to contribute to the db4o project and make it mature. Every new technology is immature. Open source is our tool to change that. Participate!

    And have a look at the feature flow of db4o over the last 2 years, which gives you an idea where we are and where we will be in a year or two.

    * ODBMS don't share data between application

    Yes. That's the idea. They are useful where you don't share data - on a database level. In SOA you don't share data on a database level. So SOA is "bad" technology, too?

    In fact, sharing data in RDBMS between many applications is overhyped. In most large enterprises you still find your own namespace for each application (unless it's a monster like SAP or so) and awkward, opaque synchronization services, operated on database level with triggers, batches.

    And don't get me started to work with Hibernate against legacy RDBMS. It's painful, to say the least. Hibernate is great to make a green-field application until V1.0. After that it's just as painful as if you were not using an ORM. That's Ted's main point in the "Vietnam" essays - decreasing marginal returns hit you down the road.

    * I cannot get my data out of an ODBMS

    Important point, but entirely untrue.

    In fact, we offer a Hibernate (kudos) based replication service to replicate your data into RDBMS.

    So see ODBMS as a realtime, client-centric persistence extension, and RDBMS may, in most cases, just be the backup for archiving and BI. E.g. Postbank, Germany's largest retail bank, uses db4o for their mobile field force of insurance agents. Store data on a PDA, replicate later when you come to the office. Or Boeing, builds a data acquisition system into the P8-A aircraft based on db4o. Store while airborne, retrieve when landed. Or INDRA Sistemas builds high speed train control systems: Store 200,000 objects per second in db4o (Oracle can do 30,000), and later aggregates them heavily to store the data in the enterprise data center (running on Oracle). Not all of these guys have smoked pot when they decided that ORMs were no option for them, even if this is what Gavin and his gang want to make you believe.

    * There is no free ODBMS which inhibits proliferation

    Partly true, but if it was entirely free, it wouldn't be good. Someone has to invest, and thus secure a revenue stream to pay for that. We at db4objects do it at the most careful, smallest denominator. Only if you want to *distribute* it with a *commercial* application (e.g. as part of a photocopier system) you need to pay a small license fees (at $199 a pop, and down to less than a dollar in large volumes). Unlike the other vendors and JBoss/Hibernate we charge close to nothing for support and services - simply because it's so dead easy that you don't need it.

    For all inhouse users and for all open source projects (GPL and non-GPL under the dOCL) db4o is free.


    Disclaimer: I am a db4o employee (the only one posting here). Please note that Gavin, who accuses "the current interlocutors" to be "the employees of db4o") is an employee of JBoss/RedHat, maker of Hibernate in case you don't know it)

  • Performance of ODBMS and RDBMS is indifferent, subject to other factors

    by Christof Wittig,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    As to Gavin's claim about performance, I respectfully disagree that an RDBMS is always faster all other things being equal.


    I most certainly never made that claim, or anything remotely like it. All I did was point out that there are many complex variables affecting performance, and, a priori, the dominant variable is *not* whether the underlying conceptual model is relational or object-oriented.

    That's not to say that any particular OODBMS is not faster than some particular RDBMS for some particular task.

    It is not even to say that, with their emphasis upon particular kinds of tasks, existing OODBMSs in general are not faster than existing RDBMSs in general. The usecase of CAD tools that you quote is a good example of where existing OODBMSs have a strong featureset, and existing RDBMSs are weak.


    I am glad we got this FUD finally out of the way.

    Thanks, Gavin, for clarifying!

    Christof

  • Declarative Relational Programming (warning: Shameless Plug)

    by Mark Proctor,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Object Oriented deep graph models, pojos specifically, cannot be easily reasoned over declaratively - which can create a reliance on imperative programming. With the work that W3C has done to standardise Description Logic with OWL-DL combined with declarative reasoning sytems like Drools (we'll be adding Description Logic based modelling after 4.0) you have a much more powerful metaphor for application development (although probably not framework/subs system development) - also forget OWL Full, its an academic exercise, and RDF triples are unfortunate, but luckily can just be considered a transport mechanism. Declarative relational programming obviously has a much closer 1 to 1 mapping with the database itself.

    For a simple example look at the Conways Game of Life example we provide (will be updated to ruleflow soon, instead of agenda groups, which will make it more declarative). In this example we have a large NxN grid of Cell objects, the previous approach was for each Cell to have a HashSet of its surrounding Cells. The only way to calculate the number of surrounding Dead/Live cells was to imperatively iterate that HashSet for each Cell. This would create repeatedly redundant work, as we don't know what has and what hasn't change, we could track that, but again it's more imperative code and tracking we have to do. The updated Conways example uses a relational approach, no nested objects,(Although no DL yet, that is post 4.0) instead we use a Neighbour class to bi-directionally relate each surrounding cell; this means we simply declare what we want it to do to track Dead/Live cells and the system, with its understanding of the relations and what has and what hasn't changed, does the rest for us.
    anonsvn.labs.jboss.com/labs/jbossrules/trunk/dr...
    rule "Calculate Live"
        agenda-group "calculate"
        lock-on-active
    when
        theCell: Cell(cellState == CellState.LIVE)
        Neighbor(cell == theCell, $neighbor : neighbor)
    then
        $neighbor.setLiveNeighbors( $neighbor.getLiveNeighbors() + 1 );
        $neighbor.setPhase( Phase.EVALUATE );
        modify( $neighbor );
    end

    rule "Calculate Dead"
        agenda-group "calculate"
        lock-on-active
    when
        theCell: Cell(cellState == CellState.DEAD)
        Neighbor(cell == theCell, $neighbor : neighbor )
    then
        $neighbor.setLiveNeighbors( $neighbor.getLiveNeighbors() - 1 );
        $neighbor.setPhase( Phase.EVALUATE );
        modify( $neighbor );
    end

    I also invite you to look at the "register neighbor" set of rules, so you can see how these Neighbour relations are setup declaratively, exploiting the cross products of the column and row fields in the Cell.

    While this is just a simple example using propositional logic you can exploit these relations much further, especially when working with sets of data and first order logic using things like 'collect', 'accumulate' and 'forall'. For more info see What's new in JBoss Rules 4.0 which is released mid next month.

    Mark
    markproctor.com
    markproctor.blogspot.com

  • Re: hmm

    by remco greve,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I used to study math also but is saying this

    which is primarily concerned with the study of transfinite numbers

    not the same as what you accuse the parent of
    I understand that most computing professionals have not studied set theory and are easily intimidated by the invocation



    and is this
    with reducing mathematics to primitive axioms (RW, ZF, etc).

    not what relational algebra also tries to do?

  • A DBMS can be both *ObjectOriented* and *Relational*

    by Yue Still Compl,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    As far as I understand, it's the accessing interface we are concerning, that is:
    *SQL* vs *Object*

    IMHO, ORM is not perfect coz it simply translates *Data Operations* via Objects to *SQL Transaction Scripts*, at CLIENT side. They can NOT map perfectly especially with *Data Semantics* seriously considered.

    My approach is to take RDBMS as a storage engine, thus provide *Relational* interface to Analytical Applications which need readonly access via SQL. While keep an ever-running *Object Graph Environment* as a standalone server instance, thus allow Transactional Applications to send executable objects to this environment for data manipulation. The *Object Graph Server* manages *Object Transactions* itself (unlike ORM that tergiversates that to the RDBMS).

    See:
    www.ableverse.org/articles/fakeorm.pdf
    tob.ableverse.org

  • Re: hmm

    by Joubin Houshyar,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    You're now deep in implementation details. So I'll return to my original argument: nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model; what you are describing is how (some) existing systems happen to implement that conceptual model. A priori, I don't see any reason why a primarily relational database could not make the optimizations you are describing. Likewise, I don't see why a primarily object-oriented system couldn't provide great reporting and data integrity, just like a relational database.

    It's not about "sets" vs. "pointers". Those are just abstractions that exist at the API level and at the conceptual level.


    I disagree with the statement that "nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model". Look at the maths in Codd's paper if you haven't had a chance to already:

    www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
    (it was a significant enough shift to warrant the Turing award for the guy in '81)

    The relational model has a conceptual foundation in set theory in maths. Object dbs do not. This conceptual link is what gives the relational model its power. To add in hierarchy, it needs to be done at this level, where sets are the main construct. How do you get a link? It's a set intersection. Hierarchy is subordinate to sets, and essentially missing from the base concepts in the model.

    [...]

    Andrew


    Andrew,

    First off, I'd like to thank you for your informative posts here and elsewhere -- over the past few days, in course of a desire to ramp up on databases (Relational or "oo") I've been reading some of your posts. Thank you for sharing your experience and understanding.

    (And thanks to Gavin, as well, since I will be (God Willing) using Hibernate in a project in the near future).

    Gavin is right, of course, that at the implementation level, all that is happening is index lookups. Surely, indirection is King in software <g> and not just in database software.

    The crux of the issue is precisely what Gavin dismisses as (apparently) irrelevant: "Conceptual level" and "API".

    Its difficult to give a robust refutation of the first tuple of the relation above, but the tuple stating that "API" is merely "abstraction" can be challenged by the relation stating that 'The semantics of interacting with a software system are directly related to its functional characteristics', and that, 'APIs are a fundamental means of capturing semantics'.

    A relational database is a system that embodies a set theoretic approach to the 'discovery of facts' from a 'knowledge base' (e.g. a collection of facts). The first tuple ('discovery of ..') is the 'semantics' bit, and table based RDBMS (and its attendant SQL flavor) is the "API" bit.

    Use an RDBMS to discover facts about the domain. (E.g. 'reporting', "ad hoc" query ..)

    An object database does not have a 'conceptual' underpinning. Why is that? Because it is an entirely *practical* notion: it is a system for facilitating the 'usage' of facts.

    (And that explains why ODBMS excel as the persistence engines of 'tool' like applications (e.g. CAD systems))

    To sum:

    Relational databases are systems for the 'discovery of facts': ("What parts can I put together?" "What is the cross product utility rate of part family x?")

    Object databases are for the 'usage of facts': ("get me the component parts of this part in my (cad) model ..")

    Every thing I have read so far on the topic (C J Date [o'reily book] and our own A McVeigh) makes sense in the light of the above distinction.

    Q: Is it an accident that ODBMS are a poor fit for 'data sharing' between applications and that they excel in the embedded domain?

    A: No.

    The need for ad-hoc access to sub-graphs of a complex persistent object graph arises when we are operating 'on' facts gathered and organized into a 'whole'. The persistent graph is a 'model' of a 'domain fact at hand'. It is the *result* of a very deep question posed to a knowledge base. That question is effectively this: what is the domain meta model for this complex element of my domain? I need to know since I will be acting (based) on this information to solve this specific problem in the domain (aka 'the application').

    Is the answer to application X's question (object graph og(X)) also a good enough answer to application Y (also in the same domain and "sharing" the domain knowledge store (aka RDBMS))?

    Well, unless X and Y are doing very similar things, the answer is that it should not come as a surprise that they are not.

    Q: What is the difference between the lookups in RDBMS and lookups in ODBMS?

    A: Conceptually, it is analogous to the difference between usage of predicate based flow branching in a procedural (rdbm) and declarative rules engine (odbm). Both systems boil down (effectively) to activation of a sequence of statements (which is minimally the raising of an event indicating the assertion of the predicate).

    In the procedural system, we test predicates (again and again) to finally resolve a (set of) statement blocks to execute.

    In the rules engine, specifically a Rete based engine such as Jesse, as 'facts' are fed into the system, the (fact insertion side-effected) underlying data structure [its a graph] is effectively a pre-computation of a whole set of predicate tests. This is so because we (in fact) told the rules engine which specific predicates we are interested in keeping a watch on.

    (So, that's just like telling the ODBMS which pathways for graph traversal you will be using. You will *not* be interested in knowing how many different (say CAD) models a certain element appears in, and where, and when, etc. You use a knowledge base (e.g. RDBMS) for that. What you are interested in is having access to an instance of that element from another element, and you indicate that interest by declaring an 'association' between (say) Fu and bar.

    For example:

    Me me = new Me (<your name here>);
    Life myLife = me.getLife();
    Fu fu = myLife.getFu();
    Bar myFuBar = fu.getBar();

    which is not the same thing as:

    Select GOV.citizen_id as dissident, S.situation_id the situtation
    FROM HLS_TOA.dossiersOnCitizens GOV, HLS_TOA.Situations S
    WHERE citizen_id = ? AND GOV.citizen_id = S.citizen_id and S.situation like 'fu%' AND S.recognition_factor IS NULL"

    [That is a "prepared statement", btw ...]

    You see?

    /Best Regards







    </g>

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT