BT

Performance Anti-Patterns in Database-Driven Applications

Posted by Alois Reitbauer on Jan 05, 2009 |

Nearly every modern application relies on databases for data persistence. The database access layer is very often responsible for serious performance problems. In the case of database problems most people start searching in the database itself. Appropriate indices and database structures are vital for achieving adequate performance. Often, however, the application layer is responsible for poor performance or scalability problems.

The application layer controls and drives database access. Problems at this layer cannot be compensated in the database itself. Therefore the design of adequate data-access logic is vital for achieving performance and scalability. While there a nearly endless different use cases for database-driven applications, the problems can be nailed down to a small set of anti-patterns. Analyzing whether your application implements the following anti-patterns and resolving them will help to easily implement faster and more scalable software with minimal additional effort.

Misuse of O/R Mappers

O/R mappers have become a central part in modern database applications. O/R mappers take away the burden of translating and accessing relational data from object-oriented software. They hide great parts of the complexity of data access from the application programmer. This results in higher productivity as the developer can concentrate on the actual application logic rather than infrastructural details. Complex data graphs can be easily navigated at the object-relational layer without seeing what is going on under the hood. This often creates the wrong impression that these frameworks take away the burden of designing data-access logic.

Often developers think that their data-access framework will simply do things right; however, using O/R mapping frameworks without understanding their inner workings in many cases results in poor application performance. There are two central misunderstandings that cause these problems – loading behavior and load time.

O/R mappers load data on a per-object base. This means when an object is requested or accessed the necessary SQL statements are created and executed. This principle is very generic and at first sight works well in most situations. At the same time it is very often the source of performance and scalability problems.

Let’s take a simple example. In a database for storing address information, we have one table for persons and one for addresses. If we want to get the name for each person and the city they live in, we have to iterate over the persons and then access the address information. The image below shows the result if the out-of-the box query mechanisms are used. This simple use case results in a high number of database queries.

antipattern1.png

This directly brings up the second important detail of O/R mappers – load time. O/R mappers – if not told otherwise – try to load data as late as possible. This behaviour is referred to as lazy loading. Lazy loading ensures that data is loaded as late as possible with the goal to perform as few database queries as possible and at the same time avoid unnecessary creation of objects. While this approach is generally feasible, it may result in serious performance problems and so-called LazyLoadingExceptions on accessing data that has not been loaded when no database connection is present.

In situations like the one described above data loading and at the same time performance can be significantly improved by using specialized data queries.

So while O/R mappers can be of great help in the development of data access they still leave the burden of designing proper data access logic. Dynamic architecture validation with tools such as dynaTrace can be of great help here to identify performance weak points in the application and proactively resolve them.

Load More Data Then Needed

Another anti-pattern in database access that can often be found is that much more data is loaded that actually needed. There are a number of reasons for this. Rapid Application development tools provide easy ways of linking data structures to user interface controls. As the data layer is built of domain objects, they very often contain much more data than is actually visualized. The example uses the address book scenario again. This time the names of the persons and their home cities are visualized. Instead of just loading these three items both objects – addresses and persons – are loaded. This results in massive overhead at the database, network and application level. The usage of specific queries can help to massively reduce the amount of queried data. This performance improvement however comes along with additional effort for maintenance. Adding a new column to the table might require several changes to the data access layer.

antipattern2.png

This anti-pattern can also be found very often in case of improperly designed service interfaces. Service interfaces are often designed to be generic enough to support a large number of use cases. This has the advantage that services have small contracts which can be used in a wide variety of use cases. Additionally uses cases change faster than the backend service implementations. This can result in services interfaces being inappropriate for certain scenarios. Developers will then have to use workarounds which might result in highly inefficient data access logic. This problem often arises in data-driven Web Services.

In order to overcome these problems data access patterns should be continuously analyzed during development. In the case of agile development approaches, data access logic should be checked for each finished user story. Additionally data access patterns should also be analyzed across application use cases to understand data access logic to be able to optimize data access logic according during development.

Inadequate Usage of Resources

Databases are a bottleneck for resources in applications, so they should be used as little as possible. Very often too little attention is paid to the usage of database connections. As with any shared resource such connections massively affect overall system performance. Specifically, web applications and applications using O/R mapping frameworks with lazy initialization tend to keep database connections longer than needed. Connections are acquired at the beginning of processing and kept until rendering is finished or no further data access is required. In applications using O/R mappers, they are often kept to avoid nasty lazy initialization problems. By redesigning data access logic and separating it from post-processing (like rendering), the performance and scalability of an application can be dramatically improved.

The chart below shows the response time of ten concurrent data processing threads. In the first part one database connection is used. In the second scenario ten connections are used. In the third scenario two database connections are used but two thirds of the processing is performed after having returned the connection. With better designed data access the third scenario nearly achieves the same performance with a tenth of the resources.

antipattern3.png

One Bunch of Everything

One Bunch of Everything is an anti-pattern that can generally be observed in development but even more in agile teams. The characteristic of this anti-pattern is that primarily features are developed and all data access is treated equally, as if there would not be any differences. However treating different types of data and queries differently can significantly improve application performance and scalability.

Data should be analyzed regarding its lifetime characteristics. How often does it change or if it is modified or only read. Access frequency of data, together with access patterns, provides hints on potential sources for caching. Access frequency also provides hints as to where optimizations make the most sense. This avoids premature and unnecessary optimization and guarantees the highest impact of performance tuning.

Analyzing usage patterns of data also helps to tune the data access layer. Understanding which data is really used helps to optimize loading strategies. Understanding how users browse search results, for example, helps to optimize fetch sizes. Knowing whether users look at order details helps to select lazy or eager loading for order positions.

In addition to data, queries should also be analyzed and categorized. Important factors are query duration, execution frequency and whether they are used in an interactive user context or batch-processing scenario. Transactional characteristics further help to fine tune isolation levels of queries.

Running short-running interactive queries of users and long-running reporting queries on the same connection for example may easily result in bad end user experience. Long-running reporting queries can greedily acquire database connections leaving end-user queries starving. Using different connection pools for different query types results in much more predicable end user performance. Softening isolation level on database queries where they are not required can also lead to significantly improved performance and scalability.

Bad Testing

Finally, missing or improper testing is one of the major reasons for performance and scalability problems in database-accessing applications. I recently gave a talk on this topic and asked the audience whether they see database access as a performance problem in their applications. While all of them agreed, nobody had testing procedures in place to test data access performance. So while it seems to be an important topic, people do not seem to invest in it.

However, even if testing procedures are in place, this does not necessarily mean that testing is done correctly. Although a lot of problems in data access logic can be found right after the code has been developed, testing is performed much later in the load testing phase. This introduces unnecessarily high costs as changes are performed late in the lifecycle possibly requiring architectural changes leading to additional development and testing efforts.

Furthermore test cases have to be designed to test real world data access scenarios. Data access has to be tested in a concurrent mode and using different access types. Only combined read/write access helps to identify locking and concurrency problems. Additionally adequate variation of input data is required to avoid unrealistically frequent cache hits.

Very often people also do not know for which load to test as they have no adequate information on expected load. In my experience this is very often the case – unfortunately. This, however, is not an excuse for not defining load and performance criteria. It is still better to have some criteria defined instead of not defining them at all.

In case you really have no clue on performance characteristics the best approach is to use load testing criteria with increasing load until the saturation point of the application is reached. Then you have identified the peak load of the application. If this sounds reasonable and realistic you are on a good way. Otherwise you know where you have to improve performance. In most cases initial tests show that application can cope with much less load as expected.

Conclusion

Database access is one of the most critical areas impacting performance and scalability in modern applications. While frameworks support in building data access logic, a serious amount of thought still has to be put into the design of data access logic to avoid pitfalls and problems. The key is to understand the details of the dynamics and characteristics of an application’s data-access layer.

About the Author

Alois Reitbauer works as a Sr. Performance Architect for dynaTrace Software. In his role as a member of the R&D department he influences the dynaTrace product strategy and works closely with key customers in implementing performance management solution for the entire application lifecycle. Alois Reitbauer has 10 years experience as architect and developer in the Java and .NET space.

About the author's employer: dynaTrace software, Inc.

dynaTrace is a leader in continuous application performance management for business-critical Java /.NET applications. Industry leaders such as UBS, LinkedIn, EnerNOC, Fidelity and Thomson Reuters use dynaTrace’s patent-pending PurePath Technology® to trace individual transactions, even in production, and gain true visibility into and across their complex Web services-based applications. dynaTrace enables all lifecycle stakeholders to collaborate – development, QA, and production – to find problems faster and quickly coordinate resolution, dramatically speeding the isolation and resolution of performance problems by 10x – or more.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Agile susceptability to "One Bunch of Everything" by Jeff Santini

I would like to hear more about how Agile projects are more susceptible to this anti-pattern. As I understand the "anti-pattern", I would correlate susceptability inversely with the number of times you have been stung before. I don't really see what "agile-ness" has to do with it.

Re: Agile susceptability to by Alois Reitbauer

Agility for sure is not generally related to performance anti-patterns. However we have seen that highly iterative feature-driven development is very oriented towards functionality and sometimes neglects performance characteristics. Incrementially improving features may lead to losing the global pictures.

A simple example the development of two distinct features each one requiring same type of data. Both features are developed independently and nobody realizes that some common data is required much to often. Now make these features services that interact with each other.

This is mainly caused as nobody is responsible for performance directly who identifies these problems. Common code ownership like in may agile approaches approaches does not assign this role to somebody specific.

I personally like agile approaches a lot. I just want to point out that incremental development of a larger set of features with are interdependent without a global "guardian" on performance and scalability can easily result problems like the ones described.

How many agile teams that have CI environments and unit tests also have performance and scalability tests for the most important parts of their application functionality?

So agility increases dynamics - which is good. However more dynamics needs more control over regressions - as they are more likely to occur. If this control is missing, you easily run into problems

Re: Agile susceptability to by Mileta Cekovic

Maybe Author thinks Agile projects are more likely to use ORM tool then non-agile projects, which probably is true for several reasons: first, if project is agile, it probably has short time-to-market and thus needs ORM more then non-agile project; second, agile projects are more likely to have progressive developers which are more willing to use ORM then non-agile project's, which are more likely to have conservative developers, not willing to give ORM a try.

Re: Agile susceptability to by Mirko Novakovic

I think that it is a common misunderstanding of "agile" that agile projects (Scrum, XP, FDD, Crystal, etc.) do not have an architect that is responsible for the global picture - its just that the architecture is not defined entirely at the beginning of a project (as in the waterfall approach) but is refined in every sprint based on feedback of the running application and new requirements that will occur during the development lifecycle.

In my point of view agility is the only way to achieve good performance in a usal project. Why? Because requirements change and so do the performance requirements and only agile projects can adapt to these new requirments quickly.

In a waterfall driven project you would first make you architecture and database access design based on your knowledge and requirements at that time. Normally theses requirements will change very quickly and so will your database design and data. In Agile projects you can test for performance in every sprint and you can refactor based on real application data which is essential for good performance tuning.

As Alois said it is very important to integrate performance into you agile development process. We do this by integrating performance requirements into "the definition of done". That means that a user story is put into the "finished" state only if the functional and non-functional requirments are met. You can also enhance you CI test suite with performance tests based on JMeter and get performance data by using tools like dynaTrace that can be integrated into Ant or Maven.

In Defense of ORM by Ben Murphy

I've seen projects that have decided not to use ORM because ORM is 'too slow' and then made the same performance mistakes that occur in ORM. Unfortunately, they are often harder to fix. For example you can often fix n+1 query problem in orm by changing your query to do a fetch join. Fixing it outside the world of ORM is often more invasive.

Re: In Defense of ORM by William Louth

We have runtime inspections that work at the resource transaction level rather than a call stack entry/exit pairing for all of these database anti-patterns for a few years now.

Automated Transaction Analysis Inspections
www.jinspired.com/products/jxinsight/new-in-5.1...

William

Re: In Defense of ORM by William Louth

It is unfortunate we cannot edit our messages. Here are so more transaction analysis inspections made available in late 2006 to early access customers.

www.jinspired.com/products/jxinsight/new-in-5.0...

Separation of Concerns by Karsten Klein

I guess the most prominent anti-pattern in this area and even in the article is "missing separation of concerns".

OR Mappers are very useful and can be very efficient for mapping objects to a relational database. And they can be even very clever and efficient. In combination with a session they even get more and more interesting.

OR mappers are weak in constructing use case specific queries from the mapping information - as long as the use case is not sufficiently described to the ORM. But therefore ORMs offer special constructs (e.g. prepared statements).

Furthermore I would not say that ORMs load too much. They load what the developer has defined should be loaded. The problem is that the default (which is good for only the simple cases) is not always appropriate. You have to consciously develop your persistence model and you have to consciously think about your use cases. Sometimes a use case can only be executed in a performing way by writing your special query by hand - no tool can take away this burden. The same is true for data aggregation.

I'd like that people start to distinguish finding, loading, aggregating and querying.
Finding: retrieving a bunch of identifiers and possible additional metadata satisfying the search and sorting criteria.
Loading: loading an object (an possibly the complete object graph).
Aggregating: aggregating a particular set of required information from the database.
Querying: a query is a request to the database and nothing more. Queries can be used for finding, loading and aggregating. I.e. ORMs use queries to load (and lazy-load) data to construct the objects.

My conclusion is that software is developed - not created in one go. If you would like to develop a piece of performing software you have to understand your domain (and the target use cases) and you have to understand your tools. You have to position and utilize your tools (technology) to achieve your goal. Goals must be defined - and this also means that key performance indicators must be derived and tested.

If you do not understand software development you anyway have an issue.

The above article is absolutely correct in a general way. However it doesn't look much at the causes of the symptoms - namely the anti-patterns. Blaming ORMs and agile methods is scratching only the (wrong) surface. People have to realize what they doing. Developing software is definitely not a no-brainer where you can through around buzz and jump from technology to technology.

Cheers,
Karsten

Typo? by Ray Davis

In the "Inadequate Usage of Resources" section, you write:

"In the first part one database connection is used. In the second part two connections are used. In the third scenario two database connections are used but two thirds of the processing is performed after having returned the connection."

Should that middle sentence read "In the second scenario *ten* connections are used"?

Re: Separation of Concerns by Rob Bygrave

In case you are not aware.... ORM's can automatically tune a query to reduce lazy loading via a feature called Autofetch (developed by Ali Ibrahim at the university of texas).

In short, Autofetch automatically tunes the ORM query to reduce lazy loading via specifying joins etc. It does this based on profiling the application usage of the object graphs.

www.cs.utexas.edu/~aibrahim/autofetch/

Ali is working on the a Hibernate autofetch extension and I have built the autofetch feature into Ebean ORM.

www.avaje.org/ebean/introquery_autofetch.html
www.avaje.org/autofetch.html

In regards ORM's fetching too much data. This occurs when ORM's do not support "partial objects" (or do not support partial objects well). IMO this is currently the case for the JPA specification and some leading ORM's.

Ebean ORM has partial object support in its query language. Also Ebean's autofetch query tuning will tune the query to use partial objects and ONLY include the properties in the query that are actually used (as a way of solving this problem).

So, yes I agree these are an issues for many ORM's (and some ORM specifications) at the moment but I think Autofetch and "Partial Objects" go a long way to solving these issues.

Don't get me wrong, the issues around ORM loading performance issues are not all resolved but Autofetch and partial objects are a huge step forward.

Cheers, Rob.

Re: Separation of Concerns by David Purcell

I agree that people need to know what they are doing. ORM tools are just that - a tool that should be used for the appropriate situation. But for some situations it isn't appropriate. I often see the case where a team says 'I use this ORM tool', and believe that they can't hit the database outside of the tool's simple mappings or query language. Sometimes you need to create a specific query for performance reasons. In a complex situation it can be difficult to come up with good SQL using an ORM query language, and developers often don't know what SQL is produced from the query.

Re: Agile susceptability to by Vijay zharotia

Agile development if implemented in its true sense would not suffer from this issue. If you doing a live demo every sprint/iteration performance issues are visible to anybody. They are detected faster in Agile mode than in waterfall.

Re: Separation of Concerns by Alois Reitbauer

The case I see here is, that OR mappers are often seen as more as they are. The provide mapping and basic loading functionality. However they can never by themselves optimize loading behavior for specific use cases.

Most OR mapping frameworks provide means to optimize loading behavior. The point is that the actual data loading behavior has to be verified as part of testing or reviews.

Re: Separation of Concerns by William Louth

All these problems can easily be averted with the right process, tooling and reporting.

In software performance engineering (www.jinspired.com/solutions/xpe/index.html) there are two basic models created the software execution and system execution.

The software execution model focuses on collecting the execution behavior of each use case developed (and iterated over the course of the development) capturing associated key performance indicators such as transaction count, database query count and any other relatively expense operation that will be performed during normal operation though its timing might deviate across stages. Each KPI is tracked across each managed (tested) release cycle. When a KPI deviates by more than a specified threshold then a more investigative analysis of the software execution model captured in one or more snapshots (timeline, profile, metrics) is investigated with the deviation explained to the dev/test management team.

William

Re: Separation of Concerns by William Louth

Here is another example (a little bit unusual) of KPI selection and management.

An Exception(ally) Bad Start
williamlouth.wordpress.com/2008/11/18/an-except...

William

Re: Separation of Concerns by Suresh K

I think it is all about procedures.
In an agile project, as you go by sprints and showcases, you run the project through the eyes of the business.
You can't possibly do formal performance and load testing in the beginning. Not because you can't but because you don't have enough features for the performance team to come and join the team and keep on doing performance testing through out the project.

But that doesn't mean performance testing can't be done at all. The performance testing team can be engaged to do their rituals near the end of the project when it all makes sense to them, just like in waterfall model when you allocate time for performance testing.

Having said that, the agile team (mostly developers) can do a lot to help so the project hasn't lost its agile shape and still covers these performance issues. Using the right tools and following the right procedure, you can easily iron out potential issues as you develop.

Profiling tools can give you the percentage of time taken to execute query (generated by ORM or otherwise). If any query looks a suspect, you can tweak it to improve it.

Re: Separation of Concerns by Suresh K

While no ORM are perfect but it all depends on the design of your application and the coding technique you use.
For example, if the domain objects are rich in behaviour, it is highly likely, the underlying ORM framework will generate more Database Queries for complex object graphs as opposed to Anemic Domain Model.

AnemicDomainModel - www.martinfowler.com/bliki/AnemicDomainModel.html
For example in AnemicDomainModel, an object graph is rarely initiated/visited by domain object methods but have specialized queries sitting on the layer above the domain objects where bulk of the processing is done.


It is then all about finding balance or alternatives.
Query Caching and Dependency Injection (of Service, for those problem parts where the query generated by ORM is too costly) on Domain objects could go a long way towards solving the problem.

At the end of the day, it is not the ORM but the developers understanding of the domain that is the weakest link.

Re: Separation of Concerns by William Louth

I disagree.

You can do performance testing on the software execution model (transaction history analysis) but not on the system execution model (resource contention with workload patterns and tx mixes). This is actually important as it allows one to acquire knowledge of the software execution model throughout the lifecycle which helps immensely during production when performing problem management.

You just need to be sensible in choosing which metrics/counters/meters to track at each stage - cost benefit analysis must include knowledge acquisition.

Each feature/use case can be tracked and low hanging fruit can be more easily identified/classified/remove/changes at a much lower cost.

William

Re: Separation of Concerns by William Louth


Profiling tools can give you the percentage of time taken to execute query (generated by ORM or otherwise). If any query looks a suspect, you can tweak it to improve it.


I would not necessarily pick clock time as this is dependent on hardware, database sizing,..... This is so old hat legacy profiling which resulted in people labeling any performance modeling/testing/tuning as premature optimization.

William

Re: Separation of Concerns by Suresh K

I would not necessarily pick clock time as this is dependent on hardware, database sizing,.....


Yes we all agree clock time would be dependent on hardware, network etc. But mind you, it is all relative. If you have 10 queries and all of them are taking a second each except one taking 5 seconds. Would you agree it is the query and not external parameters that could be the cause?

Then again it is just an indication. Of course you would have to validate the cost using query plan and more testing.

Re: Typo? by Alois Reitbauer

Exactly! Thank you for pointing this out

- Alois

Re: Separation of Concerns by William Louth


If you have 10 queries and all of them are taking a second each except one taking 5 seconds. Would you agree it is the query and not external parameters that could be the cause?


Not necessarily depending on the data (table) characteristics and volumes referenced (or filtered) by each query. If the database is representative of a production image then yes one can usually use relative comparisons to point of problems but remember most poor performing SQL statements occur not just with the volume but the load. The impact might not be visible in the statements own execution but in the side effects (lock contention) it has on other statements executing concurrently.

I tend to focus on counts at various levels of execution abstractions during the early stages until we get to the serious and complex task of attempting to simulate real-world conditions which is rarely the case though teams spend an enormous amount of effort. Why? The rarely have sufficient production concurrent activity data broken down for the purpose of test script design.

William

Re: Typo? by Abel Avram

Fixed.

Re: Agile susceptability to by Chris Webster

On the proper use of ORM...

As an Oracle RDBMS developer who's worked on several J2EE projects, my experience is that many Java developers are less familiar with the power and flexibility of the tools available to optimise query performance on an enterprise RDBMS, so they may under-estimate the need to so, or simply feel more comfortable relying on the default behaviour of an ORM tool. There is also still a tendency for some in the Java community to develop systems on the assumption that they might want to swap out an enterprise RDBMS like Oracle and replace it with a lightweight DB like MS Access at any time. This means that some developers are reluctant to take full advantage of the functionality provided by an enterprise RDBMS, because that functionality may not be available on a lightweight alternative. So it's not really a question of Agile being particularly susceptible to this issue, more a case of some Java designers/developers generally needing a better understanding of the tools available on the RDBMS side of the ORM, and of the need to take DB design and performance issues into consideration when building n-tier systems.

Re: Agile susceptability to by Kirk Pepperdine

WHile it is ok to not entirely define architecture at the beginning of your project, I've found that project that *don't* pay attention to architecture suffer from far deeper structural performance problems than those that do. I've said this many many times in the past, agile is not about being fast, it is about being agile. Part of being agile is having a structure or architecture that allows you to be agile and is adaptive to scaling issues. You don't get that on purpose unless you plan for it up front. Many products can't be bolted in after the fact in an agile manner. They need to be designed and architected in for you to take proper advantage of the. This doesn't mean the planning is finished or needs to be completed before you start coding (AKA waterfall). It just means that there needs to be a judicious amount of it before your agile yourself into rigidity.

Re: Separation of Concerns by Kirk Pepperdine

I think there is a place for performance testing in CI. In fact the projects that I've seen implement it do better at time of deployment than those that don't. And, for the same reasons that we don't wait until two weeks before the project ends to test our functional requirements. Time and time again we have learned that waiting to the end is a recipe for disaster. Why is performance any different?

Another point is; limiting performance measurements and tuning to execution profiling is taking your hammer and treating every performance problem like a nail. Execution profilers do not find significant sources of latency nor do they find many other performance problems. You should read some of Williams links. He's mostly right in them ;-)

Re: Separation of Concerns by Kirk Pepperdine

Again, there are more problems that crop up than CPU.. if you are only looking for CPU you are going to miss the major problems that often have greater affects on performance and most of them have to do with other aspects of system performance.

Re: Agile susceptability to by Kirk Pepperdine

With all due respect Chris, my experience is that there is often a large gap in understanding in both directions. I often find problems where db transactions are very quick however the application server is waiting 10s if not 100s of MS for the data to appear. There is a gap in measurements that neither side often understands. Often for that reason it make sense not to use the database for anything more than an record of store and develop data grids or other technologies that hold data in memory using non-transactional data structures. memcache is one such technology that offers tremendous performance advantages over using a full blown RDB. Coherence is yet another.

Getting back to the agile point, using these types of technologies instead of or in concert with ORM requires some up front architectural planning. It is difficult to bolt these products into an application after the fact.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

28 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT