InfoQ

News

O/R Mapping, Caching, and Performance

Posted by Jonathan Allen on Jan 03, 2007 06:00 AM

Community
.NET
Topics
Data Access,
Performance & Scalability
Tags
Hibernate,
Caching,
ORM,
nHibernate

According to Frans Bouma, one of the common misconceptions about Object/Relational Mapping (O/R Mapping) frameworks is that they give developers caching for free and that caching improves performance. While O/R Mapping frameworks do rely on caching, improved performance isn't in the cards.

Essentially caching improves performance by eliminating calls to the database. Database calls tend to be orders of magnitude more expensive that retrieving the information locally.

Caching is very important for most high performance applications. Without caching, an application under heavy loads can grind to a halt. However caching doesn't automatically improve performance. The right information has to be cached.

The problem with O/R Mapping is that the full set of records is almost never in the cache. So if multiple records are requested based on some search criteria, there is no way to know if they are all in the cache. Frans Bouma continues...

This thus causes a roundtrip and a query execution on the database. As roundtrips and query executions are a big bottleneck of the complete entity fetch pipeline, the efficiency the myth talks about is nowhere in sight. But it gets worse. With a cache, there's actually more overhead. This is caused by the uniquing feature of a cache. So every entity fetched from the database matching the query for the customers has to be checked with the cache: is there already an instance available? If so, update the field values and return that instance, if not, create a new instance (but that's to be done anyway) and store it in the cache.

Caching does serve a purpose for O/R Mapping frameworks, it solves the problem of uniqueness. Normally multiple calls to the database for the same information results in having multiple objects with the same data. Usually this is acceptable.

However sometimes it can be a problem or an inconvenience. When that happens, it's good that there's a way to have unique objects per entity loaded. Most O/R mappers use a cache for this: when an entity is loaded from the database, the cache is consulted if there's already an entity object with the entity data of the same entity fetched. If that's the case, that instance is updated with the data read from the database, and that instance is returned as the object holding the data. If there's no object already containing the same entity, a new instance is created, the entity data fetched is stored in that instance, that instance is stored in the cache and the instance is returned. This leads to unique objects per entity.

 

 

5 comments

Reply

  1. Back to top

    Oversimplified.

    Jan 3, 2007 8:50 AM by Sam Smoot

    Just because some implementation details may be similar does not make an IdentityMap a cache. It serves a different purpose. Frans obviously understands this, so it's a little disingenuous of him to claim a "caching overhead" when he knows full-well it's really a mechanism for maintaining object identity. Basically two User objects into two ILists that have the same Id should be a reference to the same object. If you add a new Invoice object to one, you expect it the other to reflect that because you expect the other to actually *be* a reference to that same object. Frans is a wickedly smart guy, but it helps I think if you think of his comments on caching as part justification for a design decision in his product, LLBLGenPro. Not that he's wrong. He's absolutely right that using say, ActiveRecord with MemCached caching could very easily result in stale data being used in an application if you allow just anyone (or another application) to modify the underlying data directly in the database. It's a trade-off. Frans is trading performance by limiting the IdentityMap to an individual Session for correctness. With the blazing performance of .NET, it's not really a factor that should influence your decision on which O/R Mapper you use either. In my opinion. Caching is really a separate concern I think. One you should be able to apply on simple "lookup" entities manually, limiting your JOINs, speeding access, an limiting the impact of stale data by only caching basically static entities. It'd be nice if LLBLGenPro included such a mechanism in a "Contrib"-like project though. (It might, I haven't used it in a very long time.) So in summary, Frans is right, but the semantics of the conversation could be better. I understand he probably has to address this nearly every day though with email about why LLBLGenPro doesn't include "caching". Don't discount LLBLGenPro because it doesn't include caching. Instead focus on O/R Mappers like NHibernate and LLBLGenPro that solve the N+1 Query problem effectively, and then choose the one whose syntax and tool support you like best.

  2. Back to top

    General Guidelines

    Jan 3, 2007 10:30 AM by Matt Giacomini

    This article is interesting not because I agree with everything the author says, but because I have been spending a lot of time trying to get a good understanding of when to cache and when not to cache. First let me say that we use Toplink, not EJB3 based on Toplink, but just raw Toplink. All of our applications work on a DB that is being updated by other applications. First instinct is "Oh crap I better not cache anything because being slow would be much better then displaying incorrect data." Working in the financial sector incorrect data is absolute 'no no'. On the other hand many of our applications are constantly calling up the same data, and when testing our apps using Toplink's built in caching the performance increase is impressive. Contrary to what the author says just letting Toplink cache everything has had a very positive performance impact on all of the application that we tested. We have almost considered in some cases using database triggers to call to the application server and invalidate targeted areas of the cache when necessary, but this is a terrible idea from an architectural point of view. I guess I'm just rambling. Too bad there is not a product out there that was like a DB plugin that would aid in application server cache management.

  3. Back to top

    My mileage

    Jan 3, 2007 5:02 PM by Chris Seymour

    It may be stating the obvious but caching only really works when: cost of checking whether the cache is stale < cost of just returning the data What "cost" means will depend on your specific implementation but is typically determined by network latency, query processor hit, etc. So far, I've only been able to guarantee this cost in 2 cases: (1) For data that rarely becomes stale (e.g. reference data) (2) Where there is a simple way to check whether a range of data has become stale (e.g. a large object graph is always updated atomically, so stamp and check the top-level object). (1) is easily identifiable so your performance gains really come from identifying or creating the (2)s In both these cases I've found the caching used by ORMs to be insufficient (or overkill, depending on your perspective) and ended up with relatively simple custom implementations.

  4. Back to top

    Alternative approach

    Jan 11, 2007 2:21 PM by Cameron Purdy

    It may be stating the obvious but caching only really works when: cost of checking whether the cache is stale < cost of just returning the data
    If you place "returning" with "obtaining", then in a traditional caching sense, this is true. However, a lot of modern cache-based architectures manage the up-to-date live transactional data in the cache, so that is where the application goes to for the up-to-date data. The trade-off is that this works terribly when lots of applications go directly to a database to conduct their transactions. However, when it is applicable, the scalable performance delta is ludicrous, often speeding things up and dropping the load on the database by orders of magnitude. Hope it helps :-) Peace, Cameron Purdy Tangosol Coherence: Clustered Caching for Java and .NET

  5. Back to top

    Re: Alternative approach

    Jun 30, 2008 6:06 PM by berkay NiQuiL

Exclusive Content

Agile Project Management: Lessons Learned at Google

In this presentation filmed during QCon 2007, Jeff Sutherland, the creator of Scrum, talks about his visit at Google to do an analysis of Google's first implementation of Scrum.

AtomServer – The Power of Publishing for Data Distribution

In this article, Bryon Jacob and Chris Berry introduce AtomServer, their implementation of a full-fledged Atom Store based on Apache Abdera, which is now available as open source.

An Introduction to Virtualization

It is easy to think that virtualization applies only to servers. In reality the recent resurgence of the concept is also being applied to networking, storage, and application infrastructure.

REST Anti-Patterns

In this article, Stefan Tilkov explains some of the most common anti-patterns found in applications that claim to follow a "RESTful" design and suggests ways to avoid them.

Choosing between Routing and Orchestration in an ESB

In this article, Adrien Louis and Marc Dutoo discuss the differences and relative merits of using orchestration vs. routing in a typical ESB setup, and discuss various implementation options.

Enterprise Batch Processing with Spring

Wayne Lund discusses batch processing, Spring Batch objectives and features, scenarios for usage, Spring Batch architecture, scaling, example code, failures and retrying, and the future roadmap.

User Story Estimation Techniques

Developer Jay Fields draws on his experiences as a ThoughtWorks consultant to describe effective user story estimation techniques.

Security (CAS and OpenID) with Ruby

In this talk from QCon SF 2007, Justin Gehtland explains two open solutions to distributed identity and their Rails integration components: OpenID (using ruby-openid) and CAS (using rubycas-client).