BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News O/R Mapping, Caching, and Performance

O/R Mapping, Caching, and Performance

Bookmarks

According to Frans Bouma, one of the common misconceptions about Object/Relational Mapping (O/R Mapping) frameworks is that they give developers caching for free and that caching improves performance. While O/R Mapping frameworks do rely on caching, improved performance isn't in the cards.

Essentially caching improves performance by eliminating calls to the database. Database calls tend to be orders of magnitude more expensive that retrieving the information locally.

Caching is very important for most high performance applications. Without caching, an application under heavy loads can grind to a halt. However caching doesn't automatically improve performance. The right information has to be cached.

The problem with O/R Mapping is that the full set of records is almost never in the cache. So if multiple records are requested based on some search criteria, there is no way to know if they are all in the cache. Frans Bouma continues...

This thus causes a roundtrip and a query execution on the database. As roundtrips and query executions are a big bottleneck of the complete entity fetch pipeline, the efficiency the myth talks about is nowhere in sight. But it gets worse. With a cache, there's actually more overhead. This is caused by the uniquing feature of a cache. So every entity fetched from the database matching the query for the customers has to be checked with the cache: is there already an instance available? If so, update the field values and return that instance, if not, create a new instance (but that's to be done anyway) and store it in the cache.

Caching does serve a purpose for O/R Mapping frameworks, it solves the problem of uniqueness. Normally multiple calls to the database for the same information results in having multiple objects with the same data. Usually this is acceptable.

However sometimes it can be a problem or an inconvenience. When that happens, it's good that there's a way to have unique objects per entity loaded. Most O/R mappers use a cache for this: when an entity is loaded from the database, the cache is consulted if there's already an entity object with the entity data of the same entity fetched. If that's the case, that instance is updated with the data read from the database, and that instance is returned as the object holding the data. If there's no object already containing the same entity, a new instance is created, the entity data fetched is stored in that instance, that instance is stored in the cache and the instance is returned. This leads to unique objects per entity.

 

 

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Oversimplified.

    by Sam Smoot,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Just because some implementation details may be similar does not make an IdentityMap a cache. It serves a different purpose. Frans obviously understands this, so it's a little disingenuous of him to claim a "caching overhead" when he knows full-well it's really a mechanism for maintaining object identity.

    Basically two User objects into two ILists that have the same Id should be a reference to the same object. If you add a new Invoice object to one, you expect it the other to reflect that because you expect the other to actually *be* a reference to that same object.

    Frans is a wickedly smart guy, but it helps I think if you think of his comments on caching as part justification for a design decision in his product, LLBLGenPro. Not that he's wrong. He's absolutely right that using say, ActiveRecord with MemCached caching could very easily result in stale data being used in an application if you allow just anyone (or another application) to modify the underlying data directly in the database.

    It's a trade-off. Frans is trading performance by limiting the IdentityMap to an individual Session for correctness. With the blazing performance of .NET, it's not really a factor that should influence your decision on which O/R Mapper you use either. In my opinion.

    Caching is really a separate concern I think. One you should be able to apply on simple "lookup" entities manually, limiting your JOINs, speeding access, an limiting the impact of stale data by only caching basically static entities. It'd be nice if LLBLGenPro included such a mechanism in a "Contrib"-like project though. (It might, I haven't used it in a very long time.)

    So in summary, Frans is right, but the semantics of the conversation could be better. I understand he probably has to address this nearly every day though with email about why LLBLGenPro doesn't include "caching". Don't discount LLBLGenPro because it doesn't include caching. Instead focus on O/R Mappers like NHibernate and LLBLGenPro that solve the N+1 Query problem effectively, and then choose the one whose syntax and tool support you like best.

  • General Guidelines

    by Matt Giacomini,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    This article is interesting not because I agree with everything the author says, but because I have been spending a lot of time trying to get a good understanding of when to cache and when not to cache.

    First let me say that we use Toplink, not EJB3 based on Toplink, but just raw Toplink.

    All of our applications work on a DB that is being updated by other applications. First instinct is "Oh crap I better not cache anything because being slow would be much better then displaying incorrect data." Working in the financial sector incorrect data is absolute 'no no'. On the other hand many of our applications are constantly calling up the same data, and when testing our apps using Toplink's built in caching the performance increase is impressive. Contrary to what the author says just letting Toplink cache everything has had a very positive performance impact on all of the application that we tested.

    We have almost considered in some cases using database triggers to call to the application server and invalidate targeted areas of the cache when necessary, but this is a terrible idea from an architectural point of view.

    I guess I'm just rambling. Too bad there is not a product out there that was like a DB plugin that would aid in application server cache management.

  • My mileage

    by Chris Seymour,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It may be stating the obvious but caching only really works when:

    cost of checking whether the cache is stale < cost of just returning the data

    What "cost" means will depend on your specific implementation but is typically determined by network latency, query processor hit, etc. So far, I've only been able to guarantee this cost in 2 cases:

    (1) For data that rarely becomes stale (e.g. reference data)
    (2) Where there is a simple way to check whether a range of data has become stale (e.g. a large object graph is always updated atomically, so stamp and check the top-level object).

    (1) is easily identifiable so your performance gains really come from identifying or creating the (2)s

    In both these cases I've found the caching used by ORMs to be insufficient (or overkill, depending on your perspective) and ended up with relatively simple custom implementations.

  • Alternative approach

    by Cameron Purdy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It may be stating the obvious but caching only really works when:

    cost of checking whether the cache is stale < cost of just returning the data


    If you place "returning" with "obtaining", then in a traditional caching sense, this is true.

    However, a lot of modern cache-based architectures manage the up-to-date live transactional data in the cache, so that is where the application goes to for the up-to-date data. The trade-off is that this works terribly when lots of applications go directly to a database to conduct their transactions. However, when it is applicable, the scalable performance delta is ludicrous, often speeding things up and dropping the load on the database by orders of magnitude.

    Hope it helps :-)

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Caching for Java and .NET

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT