Agile Project Management: Lessons Learned at Google
In this presentation filmed during QCon 2007, Jeff Sutherland, the creator of Scrum, talks about his visit at Google to do an analysis of Google's first implementation of Scrum.
Tracking change and innovation in the enterprise software development community
Posted by Jonathan Allen on Jan 03, 2007 06:00 AM
According to Frans Bouma, one of the common misconceptions about Object/Relational Mapping (O/R Mapping) frameworks is that they give developers caching for free and that caching improves performance. While O/R Mapping frameworks do rely on caching, improved performance isn't in the cards.
Essentially caching improves performance by eliminating calls to the database. Database calls tend to be orders of magnitude more expensive that retrieving the information locally.
Caching is very important for most high performance applications. Without caching, an application under heavy loads can grind to a halt. However caching doesn't automatically improve performance. The right information has to be cached.
The problem with O/R Mapping is that the full set of records is almost never in the cache. So if multiple records are requested based on some search criteria, there is no way to know if they are all in the cache. Frans Bouma continues...
This thus causes a roundtrip and a query execution on the database. As roundtrips and query executions are a big bottleneck of the complete entity fetch pipeline, the efficiency the myth talks about is nowhere in sight. But it gets worse. With a cache, there's actually more overhead. This is caused by the uniquing feature of a cache. So every entity fetched from the database matching the query for the customers has to be checked with the cache: is there already an instance available? If so, update the field values and return that instance, if not, create a new instance (but that's to be done anyway) and store it in the cache.
Caching does serve a purpose for O/R Mapping frameworks, it solves the problem of uniqueness. Normally multiple calls to the database for the same information results in having multiple objects with the same data. Usually this is acceptable.
However sometimes it can be a problem or an inconvenience. When that happens, it's good that there's a way to have unique objects per entity loaded. Most O/R mappers use a cache for this: when an entity is loaded from the database, the cache is consulted if there's already an entity object with the entity data of the same entity fetched. If that's the case, that instance is updated with the data read from the database, and that instance is returned as the object holding the data. If there's no object already containing the same entity, a new instance is created, the entity data fetched is stored in that instance, that instance is stored in the cache and the instance is returned. This leads to unique objects per entity.
Scaling a Massively Multi-player Server Casestudy: Terracotta on SmartFoxServer
Hibernate without Database Bottlenecks
Scale Your Application without Punishing Your Database
Why Should I Care About Terracotta?
Terracotta 2.6 - Download now for scalability without tradeoffs
Just because some implementation details may be similar does not make an IdentityMap a cache. It serves a different purpose. Frans obviously understands this, so it's a little disingenuous of him to claim a "caching overhead" when he knows full-well it's really a mechanism for maintaining object identity. Basically two User objects into two ILists that have the same Id should be a reference to the same object. If you add a new Invoice object to one, you expect it the other to reflect that because you expect the other to actually *be* a reference to that same object. Frans is a wickedly smart guy, but it helps I think if you think of his comments on caching as part justification for a design decision in his product, LLBLGenPro. Not that he's wrong. He's absolutely right that using say, ActiveRecord with MemCached caching could very easily result in stale data being used in an application if you allow just anyone (or another application) to modify the underlying data directly in the database. It's a trade-off. Frans is trading performance by limiting the IdentityMap to an individual Session for correctness. With the blazing performance of .NET, it's not really a factor that should influence your decision on which O/R Mapper you use either. In my opinion. Caching is really a separate concern I think. One you should be able to apply on simple "lookup" entities manually, limiting your JOINs, speeding access, an limiting the impact of stale data by only caching basically static entities. It'd be nice if LLBLGenPro included such a mechanism in a "Contrib"-like project though. (It might, I haven't used it in a very long time.) So in summary, Frans is right, but the semantics of the conversation could be better. I understand he probably has to address this nearly every day though with email about why LLBLGenPro doesn't include "caching". Don't discount LLBLGenPro because it doesn't include caching. Instead focus on O/R Mappers like NHibernate and LLBLGenPro that solve the N+1 Query problem effectively, and then choose the one whose syntax and tool support you like best.
This article is interesting not because I agree with everything the author says, but because I have been spending a lot of time trying to get a good understanding of when to cache and when not to cache. First let me say that we use Toplink, not EJB3 based on Toplink, but just raw Toplink. All of our applications work on a DB that is being updated by other applications. First instinct is "Oh crap I better not cache anything because being slow would be much better then displaying incorrect data." Working in the financial sector incorrect data is absolute 'no no'. On the other hand many of our applications are constantly calling up the same data, and when testing our apps using Toplink's built in caching the performance increase is impressive. Contrary to what the author says just letting Toplink cache everything has had a very positive performance impact on all of the application that we tested. We have almost considered in some cases using database triggers to call to the application server and invalidate targeted areas of the cache when necessary, but this is a terrible idea from an architectural point of view. I guess I'm just rambling. Too bad there is not a product out there that was like a DB plugin that would aid in application server cache management.
It may be stating the obvious but caching only really works when: cost of checking whether the cache is stale < cost of just returning the data What "cost" means will depend on your specific implementation but is typically determined by network latency, query processor hit, etc. So far, I've only been able to guarantee this cost in 2 cases: (1) For data that rarely becomes stale (e.g. reference data) (2) Where there is a simple way to check whether a range of data has become stale (e.g. a large object graph is always updated atomically, so stamp and check the top-level object). (1) is easily identifiable so your performance gains really come from identifying or creating the (2)s In both these cases I've found the caching used by ORMs to be insufficient (or overkill, depending on your perspective) and ended up with relatively simple custom implementations.
It may be stating the obvious but caching only really works when:
cost of checking whether the cache is stale < cost of just returning the data
If you place "returning" with "obtaining", then in a traditional caching sense, this is true.
However, a lot of modern cache-based architectures manage the up-to-date live transactional data in the cache, so that is where the application goes to for the up-to-date data. The trade-off is that this works terribly when lots of applications go directly to a database to conduct their transactions. However, when it is applicable, the scalable performance delta is ludicrous, often speeding things up and dropping the load on the database by orders of magnitude.
Hope it helps :-)
Peace,
Cameron Purdy
Tangosol Coherence: Clustered Caching for Java and .NET
Thanks so much for this! This is exactly what I was looking for mirc mırc eski mirc script indir irc komutları mirc indir kameralı mirc sohbet mirc indir mırc indir mirc mırc mirc yükle mirc download islami sohbet dini sohbet islami site islami chat kelebek kelebek script kelebekscript kelebek.gen.tr kelebek.com kameralı mirc indir kameralı mirc kameralı sohbet chat chat yap chat sohbet chatsohbet çet çet sohbet çet odası sohbet kanalları izmir sohbet kanalları sohbet odaları aşk sohbet odaları chat odaları soru cevap sevgili sevgili bul arkadaş arkadaş ara arkadaş bul arkadaşlık bedava sohbet arkadaşlık sitesi arkadaşlık siteleri partner erkek arkadaş bayan arkadaş oto araba mp3 astroloji zoydak nedir cep telefonları gazete marifetname bedava domain ücretsiz domain bayii parça kontör bayiliği bayii online kontör
In this presentation filmed during QCon 2007, Jeff Sutherland, the creator of Scrum, talks about his visit at Google to do an analysis of Google's first implementation of Scrum.
In this article, Bryon Jacob and Chris Berry introduce AtomServer, their implementation of a full-fledged Atom Store based on Apache Abdera, which is now available as open source.
It is easy to think that virtualization applies only to servers. In reality the recent resurgence of the concept is also being applied to networking, storage, and application infrastructure.
In this article, Stefan Tilkov explains some of the most common anti-patterns found in applications that claim to follow a "RESTful" design and suggests ways to avoid them.
In this article, Adrien Louis and Marc Dutoo discuss the differences and relative merits of using orchestration vs. routing in a typical ESB setup, and discuss various implementation options.
Wayne Lund discusses batch processing, Spring Batch objectives and features, scenarios for usage, Spring Batch architecture, scaling, example code, failures and retrying, and the future roadmap.
Developer Jay Fields draws on his experiences as a ThoughtWorks consultant to describe effective user story estimation techniques.
In this talk from QCon SF 2007, Justin Gehtland explains two open solutions to distributed identity and their Rails integration components: OpenID (using ruby-openid) and CAS (using rubycas-client).
5 comments
Reply