News
Debate: ODBMS sometimes a better alternative to O/R Mapping?
Posted by Ryan Slobojan on Jun 14, 2007 08:40 PM
In a recently released article on ODBMS.org, Ted Neward elaborated upon his idea that Object/Relational Mapping (ORM) is the Vietnam of Computer Science. The main idea that he presents is that Object-oriented Database Management Systems (OODBMS) are better than Relational Database Management Systems (RDBMS) for some applications, for example:"In situations such as "silo" applications where a single user interface accesses a single database (the traditional "baby webapp on top of a big database"), or the more leading-edge "service" implementations, however, all interaction will be through that user interface or service interface, and never against the database itself, thereby making persistence truly an implementation concern only. In these situations, an OODBMS back-end can be invaluable in defining and preserving a rich domain model, as now there are no entity definitions in two languages (Java/C# and SQL DDL) to be reconciled."He also identifies the main issue that OODBMS tries to solve as the Dual-schema problem:
"[...] in a traditional object/relational world, two sets of entity definitions are in play: one defined by the programming language itself, the other by the relational model using SQL DDL. This sets up an inherent challenge, as now two sets of definitions must be kept up to date as the system grows and evolves, either by "slaving" one to the other (frequently seen by the use of code generation tactics, either from schema to classes or the other way around), or by editing/adjusting the two separately and hand-tuning the mapping between them as necessary. This creates a tension between the two, and frequently developers are forced to make sacrifices in the purity of both models in order to keep the two in sync with one another.Some people agreed with this viewpoint, such as Andrew McVeigh:Again, in an OODBMS, the fact that the class definitions are the only schema present means that no such “dual schema” problem exists; the domain model need not be slaved to the storage definitions, and the storage definitions need not be twisted into strange formations just to support the storage of a rich domain model."
"OO --> storing of complex graphs, fast navigation / traversal between objects, low impedance mismatch.However, there were also those who disagreed with this viewpoint, notably Gavin King, who wrote an article defending the role of the RDBMS. He brings up several points, a brief summary of them being:RDBMS --> data independence, suitable for complex reports, much better schema management (DDL).
Using an RDBMS for storage for a CAD system's diagrams, or an OODB for a reporting database is just asking for trouble."
- ORM is required for legacy data - ORM is the only way that you can handle existing schemas or support legacy data, so replacing it isn't an option.
- ORM can handle the DB for you - ORM solutions can generate the mappings and database schema for you if you don't have any backwards compatibility requirements.
- Data lasts longer than applications - Mapping is needed because data will almost always last longer than the application that created it.
- OODBMS is bad for compatibility - Because you store strongly-typed objects in the database, OODBMS is difficult to use with multiple development languages, whereas simple strings and numbers in an RDBMS can be mapped by each language.
- OODBMS is not mature enough - OODBMS isn't seen in major data management systems because it's very immature compared to most RDBMS systems
- Benchmarks showing OODBMS as being faster are flawed - OODBMS systems will normally run either in the same process as the app, or are written in an unscalable way - as a result, they do well in the small case, but are no good in the large case. As well, ORMs are slower because they are more robust - once OODBMSes incorporate a robust, mature featureset they will be the same speed
"To be clear, using ORM technology introduces no new no "mapping" or "dual schema" problem unless one already exists, due to the requirement of access to legacy data. If you just want to "throw some objects in the database", you'll never need to write a single mapping annotation. So, from this point of view, ORM is at least as good as an object database for all usecases, and handles other usecases (indeed, the common cases) which the object database approach does not."Gavin also said:
If you think that relational technology is for persisting the state of your application, you've missed the point. The value of the relational model is that it's democratic. Anyone's favorite programming language can understand sets of tuples of primitive values. Relational databases are an integration technology, not just a persistence technology. And integration is important. That's why we are stuck with them.This resulted in a long follow-up by Ted Neward:
"since when does one tool solve all problems? They have their own raisons d'etre, and to simply say that the OODBMS or HODBMS should be ignored just because "we've always used an RDBMS" is a crime just as great."Ted also disputes many of Gavin's points, including the following response to Gavin's assertion about the Dual-schema problem:
"Sorry, Gavin, but the fact is, this remains, and always will remain, a point of difference between you and I, and between you and a fairly large number of developers I've spoken to over the years at conferences and consulting engagements and classes. For simple table-to-class mappings, you're right, it's a pretty simple thing. It is, however, still a "dual schema" problem, in that now you have two competing "sources of truth" that have to be reconciled to one another, the database schema, and the object model. Now, perhaps if all the projects you've ever done are projects where the developer gets to define both, then the problem doesn't appear, but if you're in an "enterprise" world where the database schema is managed by a team of DBAs and is shared across projects, you don't have the flexibility to "refactor" the schema like you can your object model."The debate appears to be only getting started - care to weigh in with your own opinion?
RelatedVendorContent
Hibernate without Database Bottlenecks
IBM software architect eKit: Grady Booch podcast, whitepapers, articles
Introducing the SpringSource Application Platform
Related Sponsor
-
I think OODBMS can be a good option when the data requirements are modest. I've successfully used db4o in a couple of project and there are some other nice OODBMSs our there. However I wouldn't use them for larger project, esp. if the project is data intensive. I also talk about that in a paper I wrote on OR/Mapping when/why Arnon
-
And what about the case in which data is shared between applications having slighty different object models? Are OODBMS able to handle those kind of case?
-
> And what about the case in which data is shared between > applications having slighty different object models? > Are OODBMS able to handle those kind of case? Well this depends on the vendor. Some can better, some can not. For example, db4o can handle different classes with different names and partially different data if you use an alias. This allows e.g. the usage of Java and C# classes in different apps but on the same database. (That can run under MS or Java). By the way: This is a nice extraction of the "meta discussion" between these groups as Gavin King and Ted Neward. And a perfect forum for the discussions of the "when and why" and "the right tool for the right task" might be the Object Database Conference that will be held 2008 in Berlin with my support: http://www.icoodb.org Try to contact or join! Best Stefan Edlich
-
And what about the case in which data is shared between applications having slighty different object models? Are OODBMS able to handle those kind of case?
No, because that's not how OODBMSs work. In the OODBMS world, the object model IS the data. There is no distinction between data in the database and objects in memory, no translation, no mapping. So the object model that you load back in from the database is the same as the one you store in the database. If your object model changes, there are various ways to update the data in the database, in the same way that you can do things like ALTER TABLE with SQL (the exact mechanism depends on the OODBMS in question). But two applications cannot access the same data via a different object model because in the OODBMS world that doesn't make sense. Ted isn't saying that you should always replace an RDBMS with an OODBMS, or that RDBMSs aren't useful or efficient or flexible or extensible or robust or dependable. He's saying that sometimes, for some applications it's preferrable to use an OODBMS because it makes things easier. I don't know why people get so defensive about this. If an OODBMS is not suitable for your project, don't use it. It's as simple as that. No one is saying that RDBMS skills will become redundant, because we all know that won't happen. For the record I have nearly two decades of RDBMS experience, and also about four years working on a system which used ObjectStore and my conclusion is that an OODBMS can be a god-send if you use it for the right thing, but can be a nightmare if you try to use it where an RDBMS is a better fit. -
Actually I wasn't defensive. I have no opinions at the moment on the use of OODBMS but I am quite interested in the subject and this is a situation I've always been wondering about.And what about the case in which data is shared between applications having slighty different object models? Are OODBMS able to handle those kind of case?
No, because that's not how OODBMSs work. In the OODBMS world, the object model IS the data. There is no distinction between data in the database and objects in memory, no translation, no mapping. So the object model that you load back in from the database is the same as the one you store in the database. If your object model changes, there are various ways to update the data in the database, in the same way that you can do things like ALTER TABLE with SQL (the exact mechanism depends on the OODBMS in question). But two applications cannot access the same data via a different object model because in the OODBMS world that doesn't make sense. Ted isn't saying that you should always replace an RDBMS with an OODBMS, or that RDBMSs aren't useful or efficient or flexible or extensible or robust or dependable. He's saying that sometimes, for some applications it's preferrable to use an OODBMS because it makes things easier. I don't know why people get so defensive about this. If an OODBMS is not suitable for your project, don't use it. It's as simple as that. No one is saying that RDBMS skills will become redundant, because we all know that won't happen. For the record I have nearly two decades of RDBMS experience, and also about four years working on a system which used ObjectStore and my conclusion is that an OODBMS can be a god-send if you use it for the right thing, but can be a nightmare if you try to use it where an RDBMS is a better fit. -
Apologies if my post came across as aggressive or critical at all. That wasn't how it was intended, although reading it back now I can see that it might be read that way.
-
Now, perhaps if all the projects you've ever done are projects where the developer gets to define both, then the problem doesn't appear, but if you're in an "enterprise" world where the database schema is managed by a team of DBAs and is shared across projects, you don't have the flexibility to "refactor" the schema like you can your object model.
This argument of Ted's is spurious. Show me an environment like that where you can get them to ok an OODBMS. You think DBA's and operations teams dislike developer-built RDBMS's? How are they going to feel about a new technology that they're unfamiliar with and have no tools for. -
I find myself violently disagreeing with myself, or at least the way I've been quoted ;-) I don't believe that the correct way of looking at an OODBMS is that it minimises the need to keep an object schema and an SQL schema. To me this is spurious thinking. The key (IMO) to using an OODBMS correctly relative to an RDBMS is to note that they have very different (performance) characteristics. 1. OODBs have a very close match to single link OO navigation, a good match to OO business logic. they are optimised for that. 2. OODBs are very good with complex graphs -- like the ones required by CAD tools etc. 3. OODBs are very poor at data independence. Use an RDBMS for any business data. Areas in which I've used an OODB very successfully are with a CAD tool, and also in telecomms where there were millions of customer records with some complex graphs attached. Also, for batching and queuing messages spooled straight from a financial exchange where realtime operation was paramount. As to Gavin's claim about performance, I respectfully disagree that an RDBMS is always faster all other things being equal. The key is that OODB's excel at single link navigation, and going against the grain of these links will produce terrible performance. If you keep to this rule then, and don't require set based operations, performance in an advanced OODB is generally quite suprisingly spectacular over many millions of records. It is correct however in that OODBs are very much more immature than their RDBMS counterparts. In addition, the tight coupling of the object model and the schema actually produces problems in an enterprise when adding fields and morphing a schema... As for the use of embedded OODBs, most OODBs like Versant and others are client-server. Even "small" OODBs like objectdb offer a superb client server mode supporting thousands of transactions a second on modest hardware. I've never used an OODB in embedded mode except for the simplest of apps. Andrew
-
As to Gavin's claim about performance, I respectfully disagree that an RDBMS is always faster all other things being equal.
I most certainly never made that claim, or anything remotely like it. All I did was point out that there are many complex variables affecting performance, and, a priori, the dominant variable is *not* whether the underlying conceptual model is relational or object-oriented. That's not to say that any particular OODBMS is not faster than some particular RDBMS for some particular task. It is not even to say that, with their emphasis upon particular kinds of tasks, existing OODBMSs in general are not faster than existing RDBMSs in general. The usecase of CAD tools that you quote is a good example of where existing OODBMSs have a strong featureset, and existing RDBMSs are weak. All I'm saying is that this is not a function of the conceptual model, but rather of the implementation. There is no reason why you could not build an RDBMS that optimized navigation of hierarchical graphs of data. So, if we want to do better, as an industry, at optimizing usecases like CAD tools (and other usecases for which people have proposed OODBMSs as a panacea), which is the faster, easier, cheaper and more practical approach: (1) spend hundreds of millions of dollars re-educating developers and data management professionals on OODBMs technology and throw away decades of hard-won experience with relational technology just to get optimized CAD tools (and in the process lose all the wonderful data integrity, ad hoc querying, and interoperability features of RDBMS). OR: (2) Add some features for hierarchical graph navigation to Oracle. To be fair, the current interlocutors (ie. the employees of db4o) are not proposing a wholesale migration to OODBMS (though some others do), but even a partial migration carries many costs and inefficiencies due to the exchange of one data management technology to two. Can you imagine the pain and suffering in the inevitable need to occasionally migrate data between your OODBMS and your RDBMS? Indeed, I think many of these arguments about "some applications" and "for some kinds of data" are really missing the point that, yes, *today*, your little Java application is the only one that needs the data ... but *tomorrow*, who knows? -
yes, performance evaluation and estimation is complex. however, the 2 models of db are very different under the hood resulting in different performance characteristics. Integrating link navigation facilities of an OO database into a relational, set-based model is apparently difficult and carries tradeoffs. I haven't looked at the technical side for over 5 years, and I've never implemented either product fully myself, so bear with me, but OO dbs (under the covers) have effecively a pointer concept -- the ODMG used to call them "swizzled pointers". Under the covers of a RDBMS are set structures and indexes. Merging the 2 concepts used to be popular years ago. Versant's attempt to build a SQL layer failed dismally in my opinion. The other way around, Oracle and products like PostgesQL have added OO-like features where they don't compromise their underlying model. This so far has included nested tables and the like. Not direct links, which assume a pointer structure and carry an associated maintenance cost as items are added etc. Everything added to a model has a conceptual and practical cost. Indeed one of the complaints levied at OODBMs' was that they had no rigorous conceptual bases, unlike RDBMS' which were based on relational algebra. The latest db product I've architected with was KDB+, which is a APL-based vector database capable of storing billions of rows of time series data. It uses a table-like model and also has SQL-like facilities. Is it worth adding full SQL when it might double the storage? definitely not when you are talking about tb's of financial / market data... the db is heavily optimised for the domain. Could we use a SQL database for the same thing? possibly, but I don't want to make the cover of the SQL Server or Oracle magazine on how I'm pushing the relational envelope. Horses for courses. No one size fits all... (To be honest, the last thing I want to be portrayed as is the defender of OO databases. I've not recommended one in a business setting for over 7 years, and the last big project that I inherited with Versant, I got them to remove it as they planned to put business data in it. Don't even get me started on the limitations of db4o. I have no affiliation with any db vendor) However, for some situations OODBs and their associated performance characteristics are a big win. I won't bore you with the details of why, in a CAD situation they are better, but suffice to say there are reasons related to the depth of linking in the diagrams and the granularity of (often lazy) access to nested structures that make an OODBMS a big win here... In addition, the server can follow links to an arbitrary depth across "tables" to resolve complex graphs. If you truly have worked with such structures and have a solution, then i'd be interested in how you map it efficiently onto an RDBMS structure as I can only imagine it working with simple diagrams... AndrewAs to Gavin's claim about performance, I respectfully disagree that an RDBMS is always faster all other things being equal.
I most certainly never made that claim, or anything remotely like it. All I did was point out that there are many complex variables affecting performance, and, a priori, the dominant variable is *not* whether the underlying conceptual model is relational or object-oriented. -
OODBMS's are no panacea, but neither are RDBMS's. While there are some very solid reasons for not using an OODBMS in certain situations, like data warehousing, every argument I have seen that argues against the use of OODBMS's in any situation boils down to one sentence: "OODBMS's are not mature enough" There is nothing that precludes and OODBMS from having data integrity, ad hoc querying (couldn't they just support HQL?), or interoperability. Furthermore, while it is true that another application may need to get into the database at some point in the future (which I agree, is an important thing to have), what would prevent them from using it in the same manner as the java application? At least assuming a driver exists for the language, which it would -- if the OODBMS were mature. Hesitation to use something off the beaten track is understandable, but that does not mean the entire concept is invalid. Sure, an OODBMS is not a mature (in terms of widespread usage), battle-tested concept, but neither was a RDBMS at one point (or, more recently, ORM). It is good to see some more practical effort going into the field -- it will only help expand and improve the tools we have available.
-
Furthermore, while it is true that another application may need to get into the database at some point in the future (which I agree, is an important thing to have), what would prevent them from using it in the same manner as the java application? At least assuming a driver exists for the language, which it would -- if the OODBMS were mature.
Well my concern is not about the platform but about the OO model. I have never seen two applications use the exact same domain design (or seen any domain based libraries succeed). After all, a design is just an approximation of the world out there based on the application needs. Therefore I always though it was quite normal to have some kind of mapping (as light as possible) between the data storage model and the domain model in case where the data is or may be shared between 2 applications. -
Well my concern is not about the platform but about the OO model. I have never seen two applications use the exact same domain design (or seen any domain based libraries succeed). After all, a design is just an approximation of the world out there based on the application needs. Therefore I always though it was quite normal to have some kind of mapping (as light as possible) between the data storage model and the domain model in case where the data is or may be shared between 2 applications.
In my experience, OODBMs and database sharing are not a good match for a family of applications for precisely the reasons you mentioned. RDBMS as a more "data independent" model is far more appropriate. Andrew -
To see the pros and cons of an OODBMS approach, you really need to look into the actual database model. It's not really anything to do with maturity. OODBs have a model very similar to java objects, where links are directly encoded and they are (almost always) one way. i.e. if you have a reference in class A to class B, then you can navigate from A to B but vice versa is very difficult. so, what makes an OODBMS fast also makes it tied to the application logic that the links encode... That's why adding ad-hoc queries is so difficult. Interoperability and data independence is limited by the navigation paths you have encoded. Andrew
-
Well my concern is not about the platform but about the OO model. I have never seen two applications use the exact same domain design (or seen any domain based libraries succeed). After all, a design is just an approximation of the world out there based on the application needs. Therefore I always though it was quite normal to have some kind of mapping (as light as possible) between the data storage model and the domain model in case where the data is or may be shared between 2 applications.
You are right, it is rare for two applications to have exactly the same data model. But if you are looking at letting two applications share the same database, chances are, they are both custom applications to which you have the code. At least I have yet to see a commercial black-box application that maps itself to whatever legacy database you might have. I suppose it is possible, however. So, with that in mind, why do they need to have completely different data models? A person is a person right? You shouldn't need to represent them with a completely different object just like you don't need to have two separate persons tables for each application. When you code the second application, you might add to the data model, just like you might to a SQL schema. An OODBMS could provide field-level visibility for the applications to ensure each application only gets what it is concerned with. An OODBMS can still conceptually provide views, security, stored procedures, and all those other creature comforts we are used to when integrating with a RDBMS. Like I said, there are clear cases where an OODBMS isn't useful, mostly having to do with heavy set operations and the like. I'm not saying they are everything to everyone, but I'm still looking for that un-fixable conceptual problem that makes them so completely worthless as a number of people seem to think. -
To see the pros and cons of an OODBMS approach, you really need to look into the actual database model. It's not really anything to do with maturity. OODBs have a model very similar to java objects, where links are directly encoded and they are (almost always) one way. i.e. if you have a reference in class A to class B, then you can navigate from A to B but vice versa is very difficult.
But must OODBMS's have a java-like model? Is having one-way associations a requirement? If you need to go back and forth, what's stopping you from putting in that other link in the form of a collection or map? There might be performance problems with that, but such a thing is dependent on the implementation. It shouldn't need to have any more overhead than a hibernate many-to-one relationship.so, what makes an OODBMS fast also makes it tied to the application logic that the links encode... That's why adding ad-hoc queries is so difficult. Interoperability and data independence is limited by the navigation paths you have encoded.
I see where you are coming from -- but it doesn't have to be that way. You could program in the link where it is needed. Also, there is really nothing stopping an OODBMS from having an arbitrary join functionality in the same manner as SQL. It may come with its own performance quirks, but then again I'm also not saying an OODBMS is useful everywhere, in every situation. -
But must OODBMS's have a java-like model? Is having one-way associations a requirement? If you need to go back and forth, what's stopping you from putting in that other link in the form of a collection or map? There might be performance problems with that, but such a thing is dependent on the implementation. It shouldn't need to have any more overhead than a hibernate many-to-one relationship.
that's correct, you can put in bidirectional links and manage them just like you can in an OO language. Versant has "bilinks" for instance, which are unidirectional links which are managed both ways. However, it's *very* painful to put them in everywhere and you lose any potential layering of your data model. In practice noone does it. There is a bigger problem though than just layering, also. Particularly when there are a lot of things in a collection. If you did maintain this both ways, you get very quickly to a situation where the incremental addition of one element in a collection means that you must read in and add to a very large collection. Very unscalable even for small-ish data sets. In this sense the relational model of having a foreign key, and being able to join in both directions gives you that nice quality of data being independent from business logic and navigation. So, it's not the data it's the fact that links in your model have encoded a particular "business logic". i.e. the transactional app takes a trade and associates it with an account (trade to account). Someone then wants a gui that can look at all trades in an account (account to trades) and you are navigating against the natural link direction. Easy to fix for this example, but in practice in a large schema something will trip you up. And don't even get me started on adhoc reporting :-) Cheers, Andrew -
yes, performance evaluation and estimation is complex. however, the 2 models of db are very different under the hood resulting in different performance characteristics. ... I haven't looked at the technical side for over 5 years, and I've never implemented either product fully myself, so bear with me, but OO dbs (under the covers) have effecively a pointer concept -- the ODMG used to call them "swizzled pointers". Under the covers of a RDBMS are set structures and indexes.
Oh c'mon, that's silly. Every implementation of object references boils down to a "primary key" under the covers of a simple programming-level abstraction. Even Java layers object references over some kind of memory address. I'm quite sure that once you check the actual underlying implementation of your magical "pointers" in an OODBMS, you'll find some kind of numerical key value. (That's what "swizzle" means, btw.) Not at all different to how ORMs layer object references over primary key in the RDBMS.Integrating link navigation facilities of an OO database into a relational, set-based model is apparently difficult and carries tradeoffs.
Well, yes, it is somewhat difficult and there are indeed some tradeoffs. But I've done it, it works well, it is efficient for 98% of usecases and it has been adopted by probably > 75% of Java projects. You can find it here: http://hibernate.org It's called ORM. It's a *way* more successful technology than OODBMs. It's not perfect by any means, but it solves the problem for most people. -
So, with that in mind, why do they need to have completely different data models? A person is a person right? You shouldn't need to represent them with a completely different object just like you don't need to have two separate persons tables for each application. When you code the second application, you might add to the data model, just like you might to a SQL schema. An OODBMS could provide field-level visibility for the applications to ensure each application only gets what it is concerned with.
Yeah but an person doesn't mean the exact same thing to both application even though they use the same date. In application #1, the person may have an association to a department and 5 or 6 subclasses while in the other application, this association to a department has no meaning and shouldn't exist (even though the data is present) and doesn't support any person subclass. If it's not a case an OODMS can handle then I think morst applications should stay away from this technology because nobody can't predicate the enterprise requirements. Programming for phantom requirements may be a bad habit but your architecture has to be able to scale up or down. -
Personally, I think the role of OODBMS has been hampered by the fact that there isn't a good, clearly-recognized, free (as in beer), commercial-friendly-licensed (Apache, BSD, etc.) OODBMS. There are a number that match one or more of those criteria, but none that I'm aware of that gets the whole set.
-
Oh c'mon, that's silly. Every implementation of object references boils down to a "primary key" under the covers of a simple programming-level abstraction. Even Java layers object references over some kind of memory address.
Umm, no, it's not silly. In an object database these are literally "pointers", not some a set of numbers that must be further resolved into a set of addresses through a filter/join. (Of course, implementations differ. Objectstore was closest to this concept but was limited to the size of the virtual address space). Have a look under the covers of how these things work in the products, not just in the mapping layers. Single link navigation and set-based join are very different primitive operations with different performance characteristics and a different conceptual basis. They are much closer to the old hierarchical databases which preceded relational. (Carl, Ilan, any other OODB implementers out there who care to comment further on the implementation side?)It's called ORM. It's a *way* more successful technology than OODBMs. It's not perfect by any means, but it solves the problem for most people.
Gavin, I'm not denying the usefulness of ORM. In fact, I used to consider myself as a Toplink expert, having introduced it on several successful large projects from '99 to '04 (I gave up when per-CPU licensing fees were introduced). From what I've seen, Hibernate is Toplink done better/properly, not something new. Having said that, I'm very grateful that you have created Hibernate. Toplink is way too proprietary and was very expensive... Now the ORM layer is now effectively a commodity. It works well for most "business" situations, although despite a cache and support for object identity, ORM still retains many of the essential performance characteristics of the relational store (which is often an advantage for reporting etc). Further, caching is often problematic for horizontal scaling in large enterprise apps, requiring classification of different data types and expiry times. (as an aside, in all the object dbs I've used, they've never had a need for a cache for performance reasons). Anyway, it's very easy to become monomaniacal about these things. Horses for course. Seriously, if you haven't already get a good OODB and use it in anger. Andrew -
Personally, I think the role of OODBMS has been hampered by the fact that there isn't a good, clearly-recognized, free (as in beer), commercial-friendly-licensed (Apache, BSD, etc.) OODBMS. There are a number that match one or more of those criteria, but none that I'm aware of that gets the whole set.
I think this is definitely true. Tee lack of a good free client-server object database with support for transparent transitive persistence has held the technology back. I think this was also a true statement for smalltalk, which never had a freely available commercial quality implementation so it could become widely used... Andrew -
OK, I may be being dense here, but I'm just not seeing it.Oh c'mon, that's silly. Every implementation of object references boils down to a "primary key" under the covers of a simple programming-level abstraction. Even Java layers object references over some kind of memory address.
Umm, no, it's not silly. In an object database these are literally "pointers", not some a set of numbers that must be further resolved into a set of addresses through a filter/join. .... Have a look under the covers of how these things work in the products, not just in the mapping layers. Single link navigation and set-based join are very different primitive operations with different performance characteristics and a different conceptual basis. They are much closer to the old hierarchical databases which preceded relational.- An object database stores a to-one association as a "pointer", by which I understand you to mean an address of a disk location. (I'm skeptical that there is not some additional indirection there, but I've not implemented an object database so I'm not sure of that.)
- A relational database stores a to-one association as a primary key values which is resolved to an address of a disk location via an intermediate (efficient, pure in-memory) index lookup.
Further, caching is often problematic for horizontal scaling in large enterprise apps, requiring classification of different data types and expiry times. (as an aside, in all the object dbs I've used, they've never had a need for a cache for performance reasons).
Now you've really, really lost me. Why on earth should the caching be a different problem? I have a server, with data on disk. And a remote client, which needs that data. Ergo I need a cache. Different types of data in my system are accessed in different ways, with different consistency requirements. Ergo I need different expiry policies. I did not mention the words "object database" or "relational database" in the above paragraph. -
So this index lookup is what is making the relational database so much slower?
(I'm not an OODBMS implementation expert so I can't give you definitive answers, particularly about the extra level of indirection. If it is important we can take this offline, and I can get some of my mates who work on these products to comment) My understanding is that the difference comes in situations like the one where the RDBMS back end can't keep all of the relational indices in memory. It has to make a choice. Also, since the primary way of moving from one table to another is via relational join, which is inherently set based, then it gets slower as the indices get larger as you have more to intersect. They have clever algorithms, index types (bitmap etc) and policies but the principle is there. E.g. for a recent risk project I worked on, having 25mill risk vectors meant a query with 12 tables to get one result which took 90s in this case. For getting 1000 results back, the time was only double... Navigating a single link isn't a particularly effective use of the set paradigm. For OODBMS', initial lookup is still slow (often slower than an RDBMS) as the indices (you still have them) get larger, but once you are past that, you are literally navigating directly from disk location to location with no need for set intersection etc.I don't get it. I've simply never met an application where performance was bounded by the cost of index lookups used for resolving to-one associations. Every application I ever met was bounded by the cost of server round-trips.
So, I've seen both, but as you say the roundtrips are usually the bottleneck in my experience in most domains. However this is getting to the good stuff -- if the model is very granular, and there is a lot of it (i.e. fine grained object link resolution) then an OODBMS can be a good choice due to the more efficient link navigation. I must caveat this by saying that it's not often you have a model like this. The only real time I have is in a CAD system with lots of diagrams. You wouldn't want to design like this unless you had to, although this is a trap that people using an OODBMS usually fall into regardless of the domain. Back to roundtrips -- in a client server OODBMS with lazy resolution of links, it will often do a whole lot of object navigation and link resolution on the server side in response to one link traversal on the client side. So because the server side understands the way objects are linked, it can resolve a graph to depth 3, say, and silently cache the rest (in the client) for the transaction. This resolution can happen over many classes/tables. This is often the case when you are navigating through a complex graph, where it will pull back more than is strictly needed to minimise roundtrips.Now you've really, really lost me. Why on earth should the caching be a different problem? I have a server, with data on disk. And a remote client, which needs that data. Ergo I need a cache. Different types of data in my system are accessed in different ways, with different consistency requirements. Ergo I need different expiry policies.
I mean a client-side cache which keeps objects between transactions... In any toplink system I've used, they have a client-side cache to retain object identity (i.e. pointer to primary key) and to speed things up. As you scale out the mid tier you end up with dozens of client-side caches, which need to be kept in synch through policies. i.e. if you are updating a frequently used account balance, you can't cache it on the client side between transactions. A good client-server OODBMS only keeps the cache for a transaction retaining only references between transactions. When you commit, the contents of the local transaction cache will be cleared, meaning that it will go back to the db for anything in another transaction. Because of the way that roundtrips are handled as explained above, it is usually very quick. In fact, I hadn't even understood the need for a client side cache before using an ORM library back in '99. Before that I'd only used object databases. Indeed, one of the severe flaws of toplink (still?) is the poor control you have over the client side cache. I believe that Hibernate is vastly superior here allowing different policies for different types of data. So, consider using an ODBMS for my CAD system. (Actually, it's a complex UML2 case tool with multi-user facilities I'm finishing off for my phd). The diagram is represented as a set of node and arc objects, and each then links to an element of the model (257 different classes), which are further linked to other bits and so on in a complex web (have a look at the UML2 metamodel if you want a fright!). Drawing a diagram involves drawing all the nodes and arcs from a diagram, and then traversing to an arbitrary depth to find the details for the name, the colour, whether the element is active etc. This is expressed in java code. For a good OODBMS, drawing a screen of say 1000 elements will only involve a few roundtrips, because the server side understands the graph and can bring more in without you explicitly asking for it. (You could easily implement this sort of stuff in a server process using hibernate as the back end, but you'd essentially be building an object db) Also, as I mentioned earlier, I fully agree with you that much business data belongs in a relational db. Suffice to say, the retail bank where I work has most of its data on a mainframe in a hierarchical db :-) Very old school, and is blisteringly fast, but has the same problem with lack of data independence that an ODBMS has. Cheers, Andrew -
You're now deep in implementation details. So I'll return to my original argument: nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model; what you are describing is how (some) existing systems happen to implement that conceptual model. A priori, I don't see any reason why a primarily relational database could not make the optimizations you are describing. Likewise, I don't see why a primarily object-oriented system couldn't provide great reporting and data integrity, just like a relational database. It's not about "sets" vs. "pointers". Those are just abstractions that exist at the API level and at the conceptual level. Of course, it turns out that most relational systems that exist today happen to not implement optimizations for operations on the kind of hierarchical data that crops up in a number of niche usecases (CAD being the example that is often given). I think that's primarily because those usecases are rare in business applications. Which is not to say they're unimportant. Most importantly, none of this has to do with any "dual schema" or "paradigm mismatch" problem.
-
Yeah but an person doesn't mean the exact same thing to both application even though they use the same date. In application #1, the person may have an association to a department and 5 or 6 subclasses while in the other application, this association to a department has no meaning and shouldn't exist (even though the data is present) and doesn't support any person subclass. If it's not a case an OODMS can handle then I think morst applications should stay away from this technology because nobody can't predicate the enterprise requirements. Programming for phantom requirements may be a bad habit but your architecture has to be able to scale up or down.
It is a case an OODBMS could handle. Try thinking about an OODBMS in more generic terms, storing objects in a custom binary format, acting as a server, working through drivers in much the same way a RDBMS does. There is nothing that technically limits an OODBMS to the same limitations of, for example, Java. That in mind, it is quite possible for an OODBMS to provide a view to each application that satisfies every concern you just stated. Are there any implementations like it? Don't know. But until there is a widespread standard (i.e. the equivalent to SQL for OODBMS), we shouldn't write off what an OODBMS can and cannot do. -
You're now deep in implementation details. So I'll return to my original argument: nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model; what you are describing is how (some) existing systems happen to implement that conceptual model. A priori, I don't see any reason why a primarily relational database could not make the optimizations you are describing. Likewise, I don't see why a primarily object-oriented system couldn't provide great reporting and data integrity, just like a relational database. It's not about "sets" vs. "pointers". Those are just abstractions that exist at the API level and at the conceptual level.
I disagree with the statement that "nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model". Look at the maths in Codd's paper if you haven't had a chance to already: http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf (it was a significant enough shift to warrant the Turing award for the guy in '81) The relational model has a conceptual foundation in set theory in maths. Object dbs do not. This conceptual link is what gives the relational model its power. To add in hierarchy, it needs to be done at this level, where sets are the main construct. How do you get a link? It's a set intersection. Hierarchy is subordinate to sets, and essentially missing from the base concepts in the model. The conceptual level is vitally important when merging 2 different paradigms (let's ignore implementation details, they are unimportant). I'm not saying it can't be done, and some future paradigm will no doubt do just this. I've not seen any practical product do it successfully (OODB or relational), and I've seen lots of academic papers that attempt to address the area but have failed.Of course, it turns out that most relational systems that exist today happen to not implement optimizations for operations on the kind of hierarchical data that crops up in a number of niche usecases (CAD being the example that is often given). I think that's primarily because those usecases are rare in business applications. Which is not to say they're unimportant.
I've not thought about it in detail for around 5 years, but I think you're right. Hierarchical concepts (and their tie in to understanding OO linking patterns) are important and could be introduced into a relational system or both concepts merged somehow. However, I think history has shown that hierarchy is far less important for business apps, than a set-based view, and ORM/Hibernate does a nice job of adding almost all of the missing pieces. The fact that a relational view of data is independent of app logic has won out. RDBMS' have proved their superiority over ODBMS' for this reason, and the advantage is so compelling that RDBMS vendors haven't needed to concentrate too much in this area. I'd say in that regard, ORM is "paying off the object/hierarchical debt of the set paradigm".Most importantly, none of this has to do with any "dual schema" or "paradigm mismatch" problem.
I seriously agree with you about the dual schema mismatch problem, in the sense that I don't think it is really a big advantage of OODBMS systems. It's convenient, but it's not the real reason to choose an OODB over an RDBMS. I disagree about the paradigm mismatch point. It's very real, and has real consequences in that certain use cases will be a better fit to the paradigm of OODB or RDBMS, more or less independent of how they are implemented. "sets versus hierarchy" gets to the heart of the difference if you trace it back to concepts, and this difference has turned out to have huge ramifications. Andrew -
There is nothing that technically limits an OODBMS to the same limitations of, for example, Java. That in mind, it is quite possible for an OODBMS to provide a view to each application that satisfies every concern you just stated.
Yes, a db system could be created that does this, but it wouldn't be an "object database" in the conventional sense. The driver for OODBMs was literally just that -- a direct correlation with an object view of the world with a literal object mapping. That literal mapping has real consequences which don't work well for long lived data.Are there any implementations like it? Don't know. But until there is a widespread standard (i.e. the equivalent to SQL for OODBMS), we shouldn't write off what an OODBMS can and cannot do.
We could all start again, but the ODMG did publish standards in thie area. It was the database equivalent of the OMG (which defines UML). However, it's all but died out now. The whole area could be revisited again, but would need to do so with an understanding of what limited the previous generation. Andrew -
I disagree with the statement that "nothing you are describing here is a fundamental attribute of the object-oriented or relational conceptual model". Look at the maths in Codd's paper if you haven't had a chance to already: www.seas.upenn.edu/~zives/03f/cis550/codd.pdf (it was a significant enough shift to warrant the Turing award for the guy in '81) The relational model has a conceptual foundation in set theory in maths. Object dbs do not. This conceptual link is what gives the relational model its power. To add in hierarchy, it needs to be done at this level, where sets are the main construct. How do you get a link? It's a set intersection. Hierarchy is subordinate to sets, and essentially missing from the base concepts in the model.
OK, here's where I'm checking out of this discussion. My eyes glaze over the minute people start talking about the supposed magical set theoretical foundations of relational databases. Relational databases have almost nothing to do with what mathematicians call set theory - which is primarily concerned with the study of transfinite numbers (a la Cantor), or with reducing mathematics to primitive axioms (RW, ZF, etc). Relational databases bear the same relationship to set theory that arithmetic bears to number theory: ie. the most trivial, pedestrian application of the most basic definitions. I understand that most computing professionals have not studied set theory and are easily intimidated by the invocation of this term, but my major was pure mathematics, and I'm not intimidated, indeed I'm well aware that set theory has almost nothing to do with the practice of data management. And yes, I've heard of Codd before. And yes, I understand that linking to a Codd paper might make you seem knowledgeable, but it really doesn't advance the discussion at hand. Cheers, Gavin. -
Objects see databases as memento and object-graph storage. Databases see objects as data exposed in table rows. RDF databases see objects data exposed in schema-constrained graphs. The private of one is the public of the other. The benefits of each conflict with the design goals of the other. Perhaps REST is the middle ground that everyone can agree on. Objects interface easily using REST. They simply structure their mementos as standard document types. Now their state can easily be stored and retrieved. Databases interface easily using REST. They just map data to data. So the data in an object and the data in a database don't necessarily have precisely-matched schemas. They just map to the same set of document types and these document types define the O-R mapping. The document type pool can evolve over time based on Web and REST principles, meaning that tugs from one side of the interface don't necessarily pull the other side in exactly the same direction. If O-R mapping is the Vietnam of computer science, perhaps we should stop mapping between our object and our relational components. Perhaps we should start interfacing between them, instead. Benjamin.
-
OK, here's where I'm checking out of this discussion. My eyes glaze over the minute people start talking about the supposed magical set theoretical foundations of relational databases. Relational databases have almost nothing to do with what mathematicians call set theory - which is primarily concerned with the study of transfinite numbers (a la Cantor), or with reducing mathematics to primitive axioms (RW, ZF, etc). Relational databases bear the same relationship to set theory that arithmetic bears to number theory: ie. the most trivial, pedestrian application of the most basic definitions.
the paper I quoted has only the most trivial examples of set theory in it, and accessible to anyone with a comp-sci background. (It is comp-sci sets) Relational join is set intersection. However, I argue that the conceptual foundation led to direct implementations and a revolution in data storage. It's one of the most successful examples of conceptual preceding implementation. The connection is not trivial (and most purists argue that the modern RDBMS' are not truly relational), but is fundamental and is generally acknowledged.I understand that most computing professionals have not studied set theory and are easily intimidated by the invocation of this term, but my major was pure mathematics, and I'm not intimidated, indeed I'm well aware that set theory has almost nothing to do with the practice of data management.
I didn't quote the paper to show that I am smart, to intimidate you, to condescend to you, or to grandstand. I only have a rudimentary maths background. I did it to argue my point that relational dbs are in fact successful largely because they have a conceptual basis which has real power. They are based around Codd's 12 rules, which came directly from the theory. Object dbs aren't, and have been roundly criticised for this. And yes, given the world that I am from, if the first sight of a seminal (but accessible) academic paper makes you "run for the hills" then we are going to have to respectfully disagree on the point about conceptual importance. I'm checking out of the discussion also. Cheers, Andrew -
Sounds like a mapping layer to me as ORM is but the difference being it is located directly in the database.Yeah but an person doesn't mean the exact same thing to both application even though they use the same date. In application #1, the person may have an association to a department and 5 or 6 subclasses while in the other application, this association to a department has no meaning and shouldn't exist (even though the data is present) and doesn't support any person subclass. If it's not a case an OODMS can handle then I think morst applications should stay away from this technology because nobody can't predicate the enterprise requirements. Programming for phantom requirements may be a bad habit but your architecture has to be able to scale up or down.
There is nothing that technically limits an OODBMS to the same limitations of, for example, Java. That in mind, it is quite possible for an OODBMS to provide a view to each application that satisfies every concern you just stated.



38 comments
Reply