BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Article: Performance Anti-Patterns in Database-Driven Applications

Article: Performance Anti-Patterns in Database-Driven Applications

Bookmarks

In this article, Alois Reitbauer, a Performance Architect for dynaTrace Software, specifies several architectural anti-patterns which can downgrade an application’s performance. Knowing those anti-patterns and proactively designing the application to avoid them will keep away certain snags that can impact application’s performance.

Read: Performance Anti-Patterns in Database-Driven Applications

Alois details on the following anti-patterns:

  • Misuse of O/R Mappers
  • Loading More Data Then Needed
  • Inadequate Usage of Resources
  • One Bunch of Everything
  • Bad Testing

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Agile susceptability to "One Bunch of Everything"

    by Jeff Santini,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I would like to hear more about how Agile projects are more susceptible to this anti-pattern. As I understand the "anti-pattern", I would correlate susceptability inversely with the number of times you have been stung before. I don't really see what "agile-ness" has to do with it.

  • Re: Agile susceptability to

    by Alois Reitbauer,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Agility for sure is not generally related to performance anti-patterns. However we have seen that highly iterative feature-driven development is very oriented towards functionality and sometimes neglects performance characteristics. Incrementially improving features may lead to losing the global pictures.

    A simple example the development of two distinct features each one requiring same type of data. Both features are developed independently and nobody realizes that some common data is required much to often. Now make these features services that interact with each other.

    This is mainly caused as nobody is responsible for performance directly who identifies these problems. Common code ownership like in may agile approaches approaches does not assign this role to somebody specific.

    I personally like agile approaches a lot. I just want to point out that incremental development of a larger set of features with are interdependent without a global "guardian" on performance and scalability can easily result problems like the ones described.

    How many agile teams that have CI environments and unit tests also have performance and scalability tests for the most important parts of their application functionality?

    So agility increases dynamics - which is good. However more dynamics needs more control over regressions - as they are more likely to occur. If this control is missing, you easily run into problems

  • Re: Agile susceptability to

    by Mileta Cekovic,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Maybe Author thinks Agile projects are more likely to use ORM tool then non-agile projects, which probably is true for several reasons: first, if project is agile, it probably has short time-to-market and thus needs ORM more then non-agile project; second, agile projects are more likely to have progressive developers which are more willing to use ORM then non-agile project's, which are more likely to have conservative developers, not willing to give ORM a try.

  • Re: Agile susceptability to

    by Mirko Novakovic,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think that it is a common misunderstanding of "agile" that agile projects (Scrum, XP, FDD, Crystal, etc.) do not have an architect that is responsible for the global picture - its just that the architecture is not defined entirely at the beginning of a project (as in the waterfall approach) but is refined in every sprint based on feedback of the running application and new requirements that will occur during the development lifecycle.

    In my point of view agility is the only way to achieve good performance in a usal project. Why? Because requirements change and so do the performance requirements and only agile projects can adapt to these new requirments quickly.

    In a waterfall driven project you would first make you architecture and database access design based on your knowledge and requirements at that time. Normally theses requirements will change very quickly and so will your database design and data. In Agile projects you can test for performance in every sprint and you can refactor based on real application data which is essential for good performance tuning.

    As Alois said it is very important to integrate performance into you agile development process. We do this by integrating performance requirements into "the definition of done". That means that a user story is put into the "finished" state only if the functional and non-functional requirments are met. You can also enhance you CI test suite with performance tests based on JMeter and get performance data by using tools like dynaTrace that can be integrated into Ant or Maven.

  • In Defense of ORM

    by Ben Murphy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I've seen projects that have decided not to use ORM because ORM is 'too slow' and then made the same performance mistakes that occur in ORM. Unfortunately, they are often harder to fix. For example you can often fix n+1 query problem in orm by changing your query to do a fetch join. Fixing it outside the world of ORM is often more invasive.

  • Separation of Concerns

    by Karsten Klein,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I guess the most prominent anti-pattern in this area and even in the article is "missing separation of concerns".

    OR Mappers are very useful and can be very efficient for mapping objects to a relational database. And they can be even very clever and efficient. In combination with a session they even get more and more interesting.

    OR mappers are weak in constructing use case specific queries from the mapping information - as long as the use case is not sufficiently described to the ORM. But therefore ORMs offer special constructs (e.g. prepared statements).

    Furthermore I would not say that ORMs load too much. They load what the developer has defined should be loaded. The problem is that the default (which is good for only the simple cases) is not always appropriate. You have to consciously develop your persistence model and you have to consciously think about your use cases. Sometimes a use case can only be executed in a performing way by writing your special query by hand - no tool can take away this burden. The same is true for data aggregation.

    I'd like that people start to distinguish finding, loading, aggregating and querying.
    Finding: retrieving a bunch of identifiers and possible additional metadata satisfying the search and sorting criteria.
    Loading: loading an object (an possibly the complete object graph).
    Aggregating: aggregating a particular set of required information from the database.
    Querying: a query is a request to the database and nothing more. Queries can be used for finding, loading and aggregating. I.e. ORMs use queries to load (and lazy-load) data to construct the objects.

    My conclusion is that software is developed - not created in one go. If you would like to develop a piece of performing software you have to understand your domain (and the target use cases) and you have to understand your tools. You have to position and utilize your tools (technology) to achieve your goal. Goals must be defined - and this also means that key performance indicators must be derived and tested.

    If you do not understand software development you anyway have an issue.

    The above article is absolutely correct in a general way. However it doesn't look much at the causes of the symptoms - namely the anti-patterns. Blaming ORMs and agile methods is scratching only the (wrong) surface. People have to realize what they doing. Developing software is definitely not a no-brainer where you can through around buzz and jump from technology to technology.

    Cheers,
    Karsten

  • Typo?

    by Ray Davis,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    In the "Inadequate Usage of Resources" section, you write:

    "In the first part one database connection is used. In the second part two connections are used. In the third scenario two database connections are used but two thirds of the processing is performed after having returned the connection."

    Should that middle sentence read "In the second scenario *ten* connections are used"?

  • Re: Separation of Concerns

    by Rob Bygrave,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    In case you are not aware.... ORM's can automatically tune a query to reduce lazy loading via a feature called Autofetch (developed by Ali Ibrahim at the university of texas).

    In short, Autofetch automatically tunes the ORM query to reduce lazy loading via specifying joins etc. It does this based on profiling the application usage of the object graphs.

    www.cs.utexas.edu/~aibrahim/autofetch/

    Ali is working on the a Hibernate autofetch extension and I have built the autofetch feature into Ebean ORM.

    www.avaje.org/ebean/introquery_autofetch.html
    www.avaje.org/autofetch.html

    In regards ORM's fetching too much data. This occurs when ORM's do not support "partial objects" (or do not support partial objects well). IMO this is currently the case for the JPA specification and some leading ORM's.

    Ebean ORM has partial object support in its query language. Also Ebean's autofetch query tuning will tune the query to use partial objects and ONLY include the properties in the query that are actually used (as a way of solving this problem).

    So, yes I agree these are an issues for many ORM's (and some ORM specifications) at the moment but I think Autofetch and "Partial Objects" go a long way to solving these issues.

    Don't get me wrong, the issues around ORM loading performance issues are not all resolved but Autofetch and partial objects are a huge step forward.

    Cheers, Rob.

  • Re: Separation of Concerns

    by David Purcell,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I agree that people need to know what they are doing. ORM tools are just that - a tool that should be used for the appropriate situation. But for some situations it isn't appropriate. I often see the case where a team says 'I use this ORM tool', and believe that they can't hit the database outside of the tool's simple mappings or query language. Sometimes you need to create a specific query for performance reasons. In a complex situation it can be difficult to come up with good SQL using an ORM query language, and developers often don't know what SQL is produced from the query.

  • Re: Agile susceptability to

    by Vijay zharotia,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Agile development if implemented in its true sense would not suffer from this issue. If you doing a live demo every sprint/iteration performance issues are visible to anybody. They are detected faster in Agile mode than in waterfall.

  • Re: Separation of Concerns

    by Alois Reitbauer,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    The case I see here is, that OR mappers are often seen as more as they are. The provide mapping and basic loading functionality. However they can never by themselves optimize loading behavior for specific use cases.

    Most OR mapping frameworks provide means to optimize loading behavior. The point is that the actual data loading behavior has to be verified as part of testing or reviews.

  • Re: Separation of Concerns

    by Suresh K,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think it is all about procedures.
    In an agile project, as you go by sprints and showcases, you run the project through the eyes of the business.
    You can't possibly do formal performance and load testing in the beginning. Not because you can't but because you don't have enough features for the performance team to come and join the team and keep on doing performance testing through out the project.

    But that doesn't mean performance testing can't be done at all. The performance testing team can be engaged to do their rituals near the end of the project when it all makes sense to them, just like in waterfall model when you allocate time for performance testing.

    Having said that, the agile team (mostly developers) can do a lot to help so the project hasn't lost its agile shape and still covers these performance issues. Using the right tools and following the right procedure, you can easily iron out potential issues as you develop.

    Profiling tools can give you the percentage of time taken to execute query (generated by ORM or otherwise). If any query looks a suspect, you can tweak it to improve it.

  • Re: Separation of Concerns

    by Suresh K,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    While no ORM are perfect but it all depends on the design of your application and the coding technique you use.
    For example, if the domain objects are rich in behaviour, it is highly likely, the underlying ORM framework will generate more Database Queries for complex object graphs as opposed to Anemic Domain Model.

    AnemicDomainModel - www.martinfowler.com/bliki/AnemicDomainModel.html
    For example in AnemicDomainModel, an object graph is rarely initiated/visited by domain object methods but have specialized queries sitting on the layer above the domain objects where bulk of the processing is done.


    It is then all about finding balance or alternatives.
    Query Caching and Dependency Injection (of Service, for those problem parts where the query generated by ORM is too costly) on Domain objects could go a long way towards solving the problem.

    At the end of the day, it is not the ORM but the developers understanding of the domain that is the weakest link.

  • Re: Separation of Concerns

    by Suresh K,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I would not necessarily pick clock time as this is dependent on hardware, database sizing,.....


    Yes we all agree clock time would be dependent on hardware, network etc. But mind you, it is all relative. If you have 10 queries and all of them are taking a second each except one taking 5 seconds. Would you agree it is the query and not external parameters that could be the cause?

    Then again it is just an indication. Of course you would have to validate the cost using query plan and more testing.

  • Re: Typo?

    by Alois Reitbauer,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Exactly! Thank you for pointing this out

    - Alois

  • Re: Typo?

    by Abel Avram,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Fixed.

  • Re: Agile susceptability to

    by Chris Webster,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    On the proper use of ORM...

    As an Oracle RDBMS developer who's worked on several J2EE projects, my experience is that many Java developers are less familiar with the power and flexibility of the tools available to optimise query performance on an enterprise RDBMS, so they may under-estimate the need to so, or simply feel more comfortable relying on the default behaviour of an ORM tool. There is also still a tendency for some in the Java community to develop systems on the assumption that they might want to swap out an enterprise RDBMS like Oracle and replace it with a lightweight DB like MS Access at any time. This means that some developers are reluctant to take full advantage of the functionality provided by an enterprise RDBMS, because that functionality may not be available on a lightweight alternative. So it's not really a question of Agile being particularly susceptible to this issue, more a case of some Java designers/developers generally needing a better understanding of the tools available on the RDBMS side of the ORM, and of the need to take DB design and performance issues into consideration when building n-tier systems.

  • Re: Agile susceptability to

    by Kirk Pepperdine,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    WHile it is ok to not entirely define architecture at the beginning of your project, I've found that project that *don't* pay attention to architecture suffer from far deeper structural performance problems than those that do. I've said this many many times in the past, agile is not about being fast, it is about being agile. Part of being agile is having a structure or architecture that allows you to be agile and is adaptive to scaling issues. You don't get that on purpose unless you plan for it up front. Many products can't be bolted in after the fact in an agile manner. They need to be designed and architected in for you to take proper advantage of the. This doesn't mean the planning is finished or needs to be completed before you start coding (AKA waterfall). It just means that there needs to be a judicious amount of it before your agile yourself into rigidity.

  • Re: Separation of Concerns

    by Kirk Pepperdine,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think there is a place for performance testing in CI. In fact the projects that I've seen implement it do better at time of deployment than those that don't. And, for the same reasons that we don't wait until two weeks before the project ends to test our functional requirements. Time and time again we have learned that waiting to the end is a recipe for disaster. Why is performance any different?

    Another point is; limiting performance measurements and tuning to execution profiling is taking your hammer and treating every performance problem like a nail. Execution profilers do not find significant sources of latency nor do they find many other performance problems. You should read some of Williams links. He's mostly right in them ;-)

  • Re: Separation of Concerns

    by Kirk Pepperdine,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Again, there are more problems that crop up than CPU.. if you are only looking for CPU you are going to miss the major problems that often have greater affects on performance and most of them have to do with other aspects of system performance.

  • Re: Agile susceptability to

    by Kirk Pepperdine,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    With all due respect Chris, my experience is that there is often a large gap in understanding in both directions. I often find problems where db transactions are very quick however the application server is waiting 10s if not 100s of MS for the data to appear. There is a gap in measurements that neither side often understands. Often for that reason it make sense not to use the database for anything more than an record of store and develop data grids or other technologies that hold data in memory using non-transactional data structures. memcache is one such technology that offers tremendous performance advantages over using a full blown RDB. Coherence is yet another.

    Getting back to the agile point, using these types of technologies instead of or in concert with ORM requires some up front architectural planning. It is difficult to bolt these products into an application after the fact.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT