InfoQ Homepage Articles Revving Up Your Hibernate Engine

Revving Up Your Hibernate Engine

Oct 06, 2010 45 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

1 Introduction

Hibernate is one of most popular object-relational mapping (ORM) engines providing data persistence and query services.

It is straightforward to introduce Hibernate into your projects and make it just work. However it really takes time and quite a lot of experience to make it perform well.

This article covers a number of Hibernate tuning techniques using examples we found in our energy projects using Hibernate 3.3.1 with Oracle 9i. It also provides some database knowledge that is needed for you to really grasp some Hibernate tuning techniques.

We assume a basic understanding of Hibernate. If a tuning method is well covered in the Hibernate Reference Documentation (hereafter HRD) or in other tuning articles, we will just provide a reference to the documentation and briefly explain it from a different angle. We focus on tuning methods that are efficient, but poorly documented.

2 Hibernate Performance Tuning

Tuning is an iterative and ongoing process encompassing all phases of the Software Development Life Cycle (SDLC). In a typical Java EE application using Hibernate for persistence, tuning covers the following areas:

Tuning business rules
Tuning design
Tuning Hibernate
Tuning Java GC
Tuning application containers
Tuning underlying systems including databases and OS.

Tuning all of them without a well planned methodology is very time consuming and may well be ineffective. An important part of a good tuning methodology is prioritizing tuning areas. It can be explained by the Pareto Principle (aka “the 80/20 rule”) which says roughly 80% of application performance improvements come from the top 20% of all the performance problems ^[5].

Memory and CPU based access can provide much lower latency and greater throughput than disk and network based access. In view of this IO based tunings by Hibernate and the IO part of underlying systems should take precedence over CPU and memory based tunings by GC and the CPU and memory part of underlying systems.

Example 1
We tuned an HQL query to select electricity deals from about 30 seconds to less than 1 second. If we had worked on the Garbage Collector the improvements are likely to have been much smaller – perhaps a few milliseconds or several seconds at best, which would have been unnoticeable to traders compared to the HQL improvement.

Another important part of a good tuning methodology is deciding when to optimize ^[4].

Proactive tuning advocates promote tuning right from the start, ie in the business rules and the design phase, and carrying on tuning across the whole SDLC because they think it is often very costly to change business rules and redesign later on.

Reactive advocates promote tuning at the end of the SDLC because they complain that early tuning can often complicate the design and coding. They often quote the phrase “premature optimization is the root of all evil” popularized by Donald Knuth ^[6].

A trade-off is required to balance the tuning and coding. The authors' experience suggests that proper early tuning results in more prudent design and careful coding. Many projects fail with application tuning because the above “premature optimization” phrase was quoted out of context and accordingly tuning was either pushed to the very end of a project or given too little resource.

However, it is not always feasible to carry out a lot of early tuning because you don't know exactly where the application's bottlenecks are without profiling it first, on top of which the application is often still evolving.

Profiling our multi-threaded enterprise applications also showed most applications only had an average of 20-50% CPU usage. The rest of the CPU was just overhead waiting for database and network related IO.

Based on the above analyses, we conclude that tuning Hibernate along with the business rules and design falls into the 20% in the Pareto Principle and accordingly they should have the highest priority.

A practical approach is as follows:

Identify the main bottlenecks most of which will predictably fall into Hibernate, and the business rules and design (the number depends on your tuning goal; but three to five is a good start).
Modify your application to eliminate them.
Test your application and repeat from Step 1 again until you have achieved your tuning goal.

More general advice on performance tuning strategy can be found in Jack Shirazi’s book “Java Performance Tuning” ^[7].

In the following sections we explain specific tuning techniques in an approximate order of tuning impact (that is, those listed first generally have the highest impact).

3 Monitoring and Profiling

Without sufficient monitoring and profiling of your Hibernate application, you won't know where the performance bottlenecks are and what parts need tuning.

3.1.1 Monitoring SQL generations

Although the main purpose of using Hibernate is to save you from dealing with SQL directly, you have to know what SQL Hibernate is generating in order to tune your application. Joel Splosky elaborated the problem quite well in his article "The Law of Leaky Abstractions.”

You can turn on package org.hibernate.SQL’s log level to DEBUG in log4j to see all the SQL generated. You may also need to turn on other packages’ log levels to DEBUG, or even TRACE, to pinpoint some performance issues.

3.1.2 Examining Hibernate Statistics

If you enable hibernate.generate.statistics, Hibernate will expose metrics on entity, collection, session, second level cache, query and session factory that are useful for tuning via SessionFactory.getStatistics(). To make life easier, Hibernate can also expose metrics via JMX using an MBean “org.hibernate.jmx.StatisticsService”. You can find a configuration sample from this web site.

3.1.3 Profiling

A good profiling tool not only benefits your Hibernate tuning but also other parts of your application. However, most commercial tools such as JProbe ^[10] are very expensive. Fortunately Sun/Oracle’s JDK 1.6 includes a profiling interface called “Java VisualVM” ^[11]. Compared to its commercial counterparts, it is fairly basic. It does, however, provide a lot of debug and tuning information.

4 Tuning Techniques

4.1 Tuning Business Rules and Design

Although tuning business rules and design doesn’t belong to Hibernate tuning per se, the decisions from this tuning have a large impact on downstream Hibernate tunings. Accordingly we highlight some points related to Hibernate tunings.

Through business requirements collection and tunings, you should know:

Data retrieval characteristics including reference data, read-only data, read groups, read size, searching criteria, and data grouping and aggregation.
Data modification characteristics including data changes, change groups, change size, false modification allowance, databases (are all changes in one database or in more than one?), change frequency and concurrency, and change response and throughput requirements.
Data relationships such as associations, generalization, realization and dependency.

Based on business requirements, you come up with an optimal design where you decide your application type (On-line Transaction Processing (OLTP) or data warehouse, or close to either) and tier structure (separate persistence and service tiers or combined), create domain objects usually as POJO and decide data aggregation location (aggregation in the database takes advantage of powerful database functions and saves you network bandwidth; however it is usually not portable except for such standard aggregations as COUNT, SUM, AVG, MIN and MAX. Aggregation in the application server allows you to apply more complex business logic; however you need to load detailed data to your application tier first).

Example 2
Analysts need to see an aggregated list of electricity ISO (Independent System Operator) deals from a large DB table. Initially they wanted to show most table columns. The application took as much as 30 minutes due to loading up to 1 million rows to the front UI, even though the DB responded within about 1 minute. After reanalysis, analysts removed all but 14 columns. Because many optional high cardinality columns were gone, the aggregation grouping from the remaining columns returned far less data than before and the data loading time was cut to be acceptable in most cases.

Example 3

Electricity Hourly traders often modify 24 shaped hours that include 2 properties: hourly power volume and price (“shaped” means each hour can have its own volume and price; if all 24 hours have the same volume and price, we call them “standard”). Originally we used Hibernate’s select-before-update feature, which means updating 24 rows needs 24 selections. Because we only have 2 properties and there is no business rule prohibiting false update if a volume or price didn’t change, we turned off the select-before-update feature and avoided 24 selections.

4.2 Tuning Inheritance Mapping

Although inheritance mapping is part of domain objects, we treat it separately due to its importance. Chapter 9 “Inheritance Mapping” in HRD ^[1] already has good coverage, so we will focus on SQL generation and tuning recommendations for each strategy.

Here is the class diagram for the example in the HRD:

[The diagram includes "CreditCardType", but all the subsequent SQL refers to "cc_type"]

4.2.1 Table per Class Hierarchy

Exactly one table is required. A polymorphic query generates SQL like:

select id, payment_type, amount, currency, rtn, credit_card_type from payment

The SQL generated from a query on a concrete subclass such as CashPayment looks like:

select id, amount, currency from payment where payment_type=’CASH’

The pros include a single table, simple query and easy association with other tables. Also the second query doesn’t need to include properties in other subclasses. All these features make performance tuning much easier than other strategies. The approach generally fits well with a data warehouse system since all data is in one table without the need for table joins.

The main con is having one big table with all properties in the whole class hierarchy. If the class hierarchy has many subclass-specific properties, there will be too many null column values in the database, which makes SQL tuning difficult for current row-based databases (a data warehouse system using a Column-based DBMS handles this better). Because the only table may be the hotspot unless it is partitioned, OLTP Systems typically don’t work well with it.

4.2.2 Table per Subclass

Four tables are required. A polymorphic query generates SQL

select id, payment_type, amount, currency, rtn, credit_card type,
        case when c.payment_id is not null then 1
             when ck.payment_id is not null then 2
             when cc.payment_id is not null then 3
             when p.id is not null then 0 end as clazz
from payment p left join cash_payment c on p.id=c.payment_id left join
   cheque_payment ck on p.id=ck.payment_id left join 
   credit_payment cc on p.id=cc.payment_id;

The SQL generated from a query on a concrete subclass such as CashPayment looks like:

select id, payment_type, amount, currency
from payment p left join cash_payment c on p.id=c.payment_id;

The pros include compact tables (no unnecessary nullable columns), data partitions across three subclass tables and easy association with other tables using the top super class table. Compact tables can optimize storage blocks for row-based databases and accordingly make SQL perform better. Data partitions increase data change concurrency (no hotspot except the super class) and accordingly OLTP systems usually handle it

Again the second query doesn’t need to include properties in other subclasses.

The cons include the most tables and outer joins among all strategies and slightly more complex SQL (look at the long CASE clause for Hibernate’s dynamic discriminator). Because databases take more time on tuning table joins than a single table, data warehouses usually don’t work well with it.

Because you can’t create a composite database index using columns across the super and a subclass, the performance will suffer if you need to query on such columns. Also any data change on a subclass involves two tables: the super class table and the subclass table.

4.2.3 Table per Concrete Class

Three or more tables are involved. A polymorphic query generates SQL like:

select p.id, p.amount, p.currency, p.rtn, p. credit_card_type, p.clazz
from (select id, amount, currency, null as rtn,null as credit_card type,
             1 as clazz from cash_payment union all
      select id, amount, null as currency, rtn,null as credit_card type,
             2 as clazz from cheque_payment union all
      select id, amount, null as currency, null as rtn,credit_card type,
             3 as clazz from credit_payment) p;

The SQL generated from a query on a concrete subclass such as CashPayment looks like:

select id, payment_type, amount, currency from cash_payment;

The pros are similar to the above “Table per Subclass” strategy. Because the super class is usually abstract, and exactly three tables are required [Says at beginning that 3 or more are required], any data change on a subclass only involves one table and accordingly runs faster.

The cons include complex SQL (sub-query in the from clause and union all). However most databases can tune such SQL very well.

If a class wants to associate with the super Payment class, the DB can’t use referential integrity to enforce it; you have to use a trigger to do so. This makes the DB performance suffer a bit.

4.2.4 Table per Concrete Class using Implicit Polymorphism

Only three tables are required. A polymorphic query on Payment generates three separate SQL statements each of which is on a subclass. The Hibernate engine uses Java reflection to figure out all three subclasses for Payment.

A query on a concrete subclass only generates SQL for that subclass. These SQL statements are pretty simple and accordingly ignored here.

The pros are similar to the above section: compact tables, data partitions across three concrete subclass tables, and any data change on a subclass only involves one table.

The cons include three separate SQL statements instead of just one union SQL statement, which incurs more network IO. The Java reflection also takes time. Imagine how long it would take if you had a large set of domain objects and you implicitly selected from the top Object class.

It is not easy to make a sound selection for your mapping strategy; it requires you to carefully tune your business requirements and make sound design decisions based on your specific data scenarios.

Here are our recommendations:

Design fine-grained class hierarchy and coarse-grained database tables. Having fined-grained database tables means more table joins and accordingly more complex queries.
Don’t use polymorphic queries if you don’t need them. As shown above, queries on concrete classes only select the required data without unnecessary table joins or unions.
“Table per Class Hierarchy” is good for a data warehouse system (column-based databases), and an OLTP system with low concurrency and most columns being shared.
“Table per Subclass” is good for an OLTP system with high concurrency, simple queries and few shared columns. It is also a sound choice if you want to use database referential integrity to enforce associations.
“Table per Concrete Class” is good for an OLTP system with high concurrency, complex queries and few shared columns. Of course you have to sacrifice the association between the super class and other classes.
Mixing strategies such as “Table per Subclass” embedded in “Table per Class Hierarchy” so that you can take advantages of different strategies. You probably will also resort to it if you have to reactively redesign your mapping strategy as your project evolves
“Table per Concrete Class using Implicit Polymorphism” is not recommended due to its verbose configurations, complex association syntax using “any” element and potentially dangerous implicit queries.

Example 4
Here is part of a domain class diagram for a deal capture application:

In the beginning, the project only had GasDeal and a small user base. It used “Table per Class Hierarchy”.

OilDeal and ElectricityDeal were added later when more business requirements came out. No mapping strategy changed. However ElectricityDeal had too many of its own properties. Accordingly many electricity-specific nullable columns were added to the Deal table. Because the user base also grew, data changes became slower and slower.

For the redesign we adopted two separate tables for Gas / Oil and electricity specific properties. The new mapping was a hybrid of “Table per Class Hierarchy” and “Table per Subclass”. We also redesigned the query to allow selection on concrete deal subclasses to eliminate unnecessary table columns and joins.

4.3 Tuning Domain Objects

Based on the tuning of business rules and design in Section 4.1 you come up with a class diagram for your domain objects represented by the POJO. Our recommendations are:

4.3.1 Tuning the POJO

Separating out read-only data such as references, and read-mostly data from read-write data.
Second level caching of read-only data is the most efficient followed by non-strict read-write on read-mostly data. You also can mark a read-only POJO immutable as a tuning hint. If a service layer method deals with only read-only data, you can mark its transaction as read-only, which is a hint to optimize Hibernate and the underlying JDBC driver.
Fined-grained POJO and coarse-grained database tables.
Break a big POJO based on data change concurrency and frequency etc. Although you can define a very fine-grained object model, over-granulized tables incur database joining overhead, which is especially unacceptable for a data warehouse.
Preferring non-final classes.
Hibernate uses CGLIB proxy to implement lazy association fetching only for non-final classes. If your associated classes are final, Hibernate has to eagerly load all of them, which hurts performance.
Implementing equals() and hashCode() using your business key for detached instances.
In a multi-tiered system, you often use optimistic locking with detached objects to increase system concurrency and achieve high performance.
Defining a version or timestamp property.
Such a column is need for optimistic locking to implement long conversion (application transaction).
Preferring a composite POJO.
Very often your front-end UI needs data from several different POJOs. You should transfer to the UI a composite POJO instead of individual POJOs to have better network performance.
There are two ways to build a composite POJO at your service layer. One is to first load all the required individual POJOs, and then extract the necessary properties into your composite POJO; the other is to use HQL projection to directly select the necessary properties from your database.
The first approach is preferred if those individual POJOs are also looked up by others and they are put into the second level cache for sharing; otherwise the second approach is preferred.

4.3.2 Tuning Associations between POJOs

If the association can be one-to-one, one-to-many or many-to-one, don’t use many-to-many.
A many-to-many association needs an extra mapping table. Even though your Java code only deals with POJO at both ends, your database still needs extra joins on the mapping table for queries, and extra deletion and insertion for modifications.
Preferring unidirectional to bidirectional.
Due to the many-to-many nature, loading from one side of a bidirectional association can trigger loading of the other side which can further trigger extra data loading of the original side, and so on.
You can make similar arguments for bidirectional one-to-many and many-to-one when you navigate from the one side (the children entities) to the many side (the parent entity).
This back and forth loading takes time and may not be what you want.
Don’t define an association for the sake of association; do so only when you need to load them together, which should be decided by your business rules and design (please see Example 5 for details).
Otherwise you either don’t define any association or just define a value-typed property in the child POJO to represent the parent POJO’s ID property (similar argument for the other direction).
Tuning collection
Use the “order-by” attribute instead of “sort” if your collection sorting logic can be implemented by the underlying database because the database usually does a better sorting job than you.
Collections can either model value types (element or composite-element) or entity reference types (one-to-many or many-to-many associations). Tuning the collection of reference types is mainly tuning fetch strategy. For tuning collections of value types, Section 20.5 “Understanding Collection Performance” in HRD ^[1] already has good coverage.

Tuning fetch strategy. Please see Section 4.7

Example 5
We have a core POJO called ElectricityDeals to capture electricity deals. From a business perspective, it has dozens of many-to-one associations with reference POJOs such as Portfolio, Strategy and Trader, just to name a few. Because the reference data is pretty stable, they are cached at the front end and can be quickly looked up based on their ID properties.

In order to have good loading performance, the ElectricityDeal mapping metadata only defines the value-typed ID properties of those reference POJOs because the front end can quickly look up the portfolio from cache based on a portfolioKey if needed:
<property name="portfolioKey" column="PORTFOLIO_ID" type="integer"/> 
This implicit association avoids database table joins and extra selections, and cuts down data transfer size.

4.4 Tuning the Connection Pool

Because making a physical database connection is time consuming, you should always use a connection pool. Furthermore, you should always use a production level connection pool instead of Hibernate’s internal rudimentary pooling algorithm.

You usually provide Hibernate with a datasource which provides the pooling function. A popular open source and production level datasource is Apache DBCP’s BasicDataSource ^[13]. Most database vendors also implement their own JDBC 3.0-compliant connection pools. For example, you can also get connection load balancing and failover using the Oracle provided JDBC connection pool ^[14] along with Oracle Real Application Cluster ^[15].

Needless to say you can find plenty of connection pool tuning techniques on the web. Accordingly we will only mention common tuning parameters that are shared by most pools:

Min pool size: the minimum number of connections that can remain in the pool.
Max pool size: the maximum number of connection that can be allocated from the pool.
If your application has high concurrency and your maximum pool size is too small, your connection pool will often experience waiting. On the other hand, if your minimum pool size is too large, you may have allocated unnecessary connections.
Max idle time: the maximum time a connection may sit idle in the pool before being physically closed.
Max wait time: the maximum time the pool will wait for a connection to be returned. This can prevent runaway transactions.
Validation query: the SQL query that is used to validate connections before returning them to the caller. This is because some databases are configured to kill long idle connections and a network or database related exception may also kill a connection. In order to reduce this overhead, a connection pool can run validation while it is idle.

4.5 Tuning Transactions and Concurrency

Short database transactions are essential for any highly performing and scalable applications. You deal with transactions using a session which represents a conversation request to process a single unit of work.

Regarding the scope of unit of work and transaction boundary demarcation, there are 3 patterns:

Session-per-operation. Each database call needs a new session and transaction. Because your true business transaction usually encompasses several such operations and a large number of small transactions generally incur more database activities (the primary one is the database needs to flush changes to disk for each commit), application performance suffers. Accordingly it is an anti-pattern and shouldn’t be used.
Session-per-request-with-detached-objects. Each client request has a new session and a single transaction. You use Hibernate’s “current session” feature to associate the two together.
In a multi-tier system, users usually initiate long conversations (or application transactions). Most times we use Hibernate’s automatic versioning and detached objects to achieve optimistic concurrent control and high performance an
Session-per-conversion-with-extended (or long)-session. You keep the session open for a long conversation which may span several transactions. Although it saves you from reattachment, the session may grow out of memory and probably has stale data for high concurrency systems.

You also should be aware of the following points.

Use local transactions if you don’t need to use JTA because JTA requires many more resources and is much slower than local transactions. Even when you have more than one datasource, you don’t need JTA unless you have transactions spanning more than one datasource. In this last case you can consider using local transactions on each datasource using a technique similar to “Last Resource Commit Optimization” ^[16] (see Example 6 below for details).
Mark your transaction as read-only if it doesn’t involve data changes as mentioned in Section 4.3.1
Always set up a default transaction timeout. It ensures that no misbehaving transaction can tie up resources while returning no response to the user. It even works for local transactions.
Optimistic locking will not work if Hibernate is not the sole database user, unless you create database triggers to increment the version column for the same data change by other applications.

Example 6
Our application has several service layer methods which only deal with database “A” in most instances; however occasionally they also retrieve read-only data from database “B”. Because database “B” only provides read-only data, we still use local transactions on both databases for those methods.

The service layer does have one method involving data changes on both databases. Here is the pseudo-code:
//Make sure a local transaction on database A exists
@Transactional (readOnly=false, propagation=Propagation.REQUIRED)
public void saveIsoBids() {
  //it participates in the above annotated local transaction
  insertBidsInDatabaseA();
  //it runs in its own local transaction on database B 
  insertBidRequestsInDatabaseB(); //must be the last operation 
Because insertBidRequestsInDatabaseB() is the last operation in saveIsoBids (), only the following scenario can cause data inconsistency:

The local transaction on database “A” fails to commit when the execution returns from saveIsoBids ().

However even if you use JTA for saveIsoBids (), you still get data inconsistency when the second commit phase fails in the two phase commit (2PC) process. So if you can deal with the above data inconsistency and really don’t want JTA complexities for just one or a few methods, you should use local transactions.

4.6 Tuning HQL

4.6.1 Tuning Index Usage

HQL looks very much like SQL. From the WHERE clause in HQL you can usually guess the corresponding SQL WHERE clause. The columns in the WHERE clause decide what index the database will select.

A common mistake for most Hibernate developers is to create a new index whenever they need a new WHERE clause. Because each index incurs extra data updating overhead, the goal should be to create a small number of indexes to cover as many queries as possible.
Section 4.1 asks you to use a collection for all possible data searching criteria. If this is impractical, you can use a backend profiling tool to create a collection for all the SQL your application emits. Based on the categorization of those search criteria, you come up with a small set of indexes. In the meantime, you also should try to add extra predicates into your WHERE clause to match other WHERE clauses.

Example 7
There are two UI searches and one backend daemon search on a table called iso_deals. The first UI search always has predicates on the properties unexpectedFlag, dealStatus, tradeDate and isoId.

The second UI search is based on a user entered filter that includes tradeDate and isoId among others. Initially all these filter properties were optional.
The backend search is based on the properties isoId, participantCode and transactionType.
Further business analysis revealed the second UI search actually selects data based on some implicit unexpectedFlag and dealStatus values. We also made tradeDate mandatory on the filter (every search filter should have some mandatory properties in order to use database indexes).

In view of this we made a composite index using the order unexpectedFlag, dealStatus, tradeDate and isoId. Both UI searches can share it. (The order is important because if your predicate specifies these properties in a different order or lists some other property before them, the composite index will not be selected by the database.)

The backend search is too different from the UI searches, so we had to make another composite index for it using the order isoId, participantCode and transactionType.

4.6.2 Binding Parameters vs. String Concatenation

You can build a HQL WHERE clause using either binding parameters or simple string concatenation; the decision impacts on performance. The reason for using binding parameters is to ask the database to parse your SQL once and reuse the generated execution plan for subsequent repeat requests, which saves both CPU time and memory. However different binding values may need different SQL execution plans for optimal data access.

For example a narrow data range may return less than 5% of your total data while a large data range may return almost 90% of data. Using an index for the former case is optimal while a full table scan for the latter case is optimal.

The recommendation is to use binding parameters for OLTP and string concatenation for data warehouses because OLTP usually inserts and updates data repeatedly and only retrieves a small amount of data in a transaction; A data warehouse usually has a small number of SQL selections and it is more important to have an exact execution plan than to save CPU time and memory for these.

What if you know your OLTP searches should use the same execution plan for different binding values?

Oracle 9i and later can peek at the values of binding parameters on the first invocation and generate an execution plan. Subsequent invocations will not peek and the previous execution plan will be reused.

4.6.3 Aggregation and Order By

You can do the aggregation and “order by” either in the database or at your application’s service layer by first loading all data into your application. The former approach is recommended because the database can usually do the job much better than your application. In addition you save network bandwidth. This approach is portable across databases.

The only exception is when your application has special business rules for data aggregation and ordering that HQL doesn’t support.

4.6.4 Fetch Strategy Override

Please see the following Section 4.7.1

4.6.5 Native Query

Native Query tuning really doesn’t directly relate to HQL. But HQL does enable you to pass native queries to your underlying database. We don’t recommend it due to its importability across databases.

4.7 Tuning Fetch Strategy

A fetch strategy determines how and when Hibernate retrieves associated objects if the application needs to navigate the association. “Chapter 20 Improving performance” in HRD has pretty good coverage. We will focus on usages here.

4.7.1 Fetch Strategy Override

Different users may have different data fetching requirements. Hibernate allows you to define a fetching strategy at two locations. One is to declare it in the mapping metadata; the other is to override it in either the HQL or the Criteria.

A common approach is declaring the default fetching strategy in the mapping metadata based on the majority of fetching use cases, and overriding it in HQL and Criteria for occasional fetching use cases.

Suppose pojoA and pojoB are parent and child entities, respectively. If business rules need to load data from both entities only occasionally, you can declare a lazy collection or proxy fetching. This can be overridden with an eager fetching, such as join fetching using HQL or Criteria, when you need data from both entities.

On the other hand, if business rules need to load data from both entities most of the time, you can declare an eager fetching and override it with a lazy collection or proxy fetching using Criteria (HQL doesn’t support such an override yet).

4.7.2 N+1 Pattern or Anti-Pattern?

Select fetching incurs an N+1 problem. If you know you always need to load data from an association, you should always use join fetching. In the following two scenarios you may treat N+1 just as a pattern instead of an anti-pattern.

The first is you don’t know whether a user will navigate an association. If he/she doesn’t, you win; otherwise you still need extra N loading select SQL statements. This is kind of a catch-22.

The second is pojoA has one-to-many associations with many other POJOs such as pojoB and pojoC. Using eager inner or outer join fetching will repeat pojoA many times in the result set. When pojoA has many non-nullable properties, you have to load a large amount of data to your persistence layer. This loading takes a long time both because of network bandwidth overhead and session cache overhead (memory consumption and GC pause) if your Hibernate session is stateful.

You can make similar arguments if you have a long chain of one-to-many associations, say from pojoA to pojoB to pojoC, and so on.

You may be tempted to use the DISTINCT keyword in HQL or distinct function in Criteria or the Java Set interface to eliminate duplicated data. But all of them are implemented in Hibernate (at the persistence layer) instead of in the database.

If tests based on your network and memory configurations show N+1 performs better, you can use batch fetching, subselect fetching or second level cache for further optimization.

Example 8
Here is a sample HBM file excerpt using batch fetching:
<class name="pojoA" table="pojoA">
…
<set name="pojoBs" fetch="select" batch-size="10">
<key column="pojoa_id"/>
…
</set>
</class>  
Here is the generated SQL on the one-side pojoB:
select … from pojoB where pojoa_id in(?,?,?,?,?, ?,?,?,?,?);
The number of question marks equals the batch-size value. So the N extra SQL select statements on pojoB were cut down to N/10.

If you replace the fetch="select" with fetch="subselect", here is the generated SQL on pojoB:
select … from pojoB where pojoa_id in(select id from pojoA where …); 
Although the N extra selects were cut to 1, this is only good when re-running the query on pojoA is cheap.

If pojoA’s set of pojoBs is quite stable or pojoB has a many-to-one association to pojoA and pojoA is read-only reference data, you can also use second level cache on pojoA to eliminate the N+1 problem (Section 4.8.1 gives an example).

4.7.3 Lazy Property Fetching

Unless you have a legacy table that has many columns you don’t need, this fetching strategy usually can’t be justified since it involves extra SQL for the lazy property group.

During business analysis and design, you should put different data retrieval or change groups into different domain entities instead of using this fetching.

If you can’t redesign your legacy table, you can use projection provided in HQL or Criteria to retrieve data.

4.8 Tuning Second Level Cache

The coverage in section 20.2 “The Second Level Cache” in HRD is just too succinct for most developers to really know how to make a choice. Adding to the confusion is version 3.3 and later deprecated the “CacheProvider”-based cache by a new “RegionFactory”-based on cache. However even the latest 3.5 reference documentation doesn’t mention how to use the new approach.

We’ll still focus on the old approach due to the following considerations:

It seems that of all the popular Hibernate second level cache providers only JBoss Cache 2, Infinispan 4 and Ehcache 2 support the new approach. OSCache, SwarmCache, Coherence and Gigaspaces XAP-Data Grid still only support the old approach.
The two approaches still share the same <cache> configuration. For example, they still use the same usage attribute values “transactional|read-write|nonstrict-read-write|read-only”.
The old approach is still supported internally by several cache-region adapters. Understanding the old approach helps you to quickly understand the new approach.

4.8.1 CacheProvider-based Cache Mechanism

Understanding the mechanism is the key to making a sound selection. The key classes/interfaces are CacheConcurrencyStrategy and its implementation classes for the 4 different cache usages, and EntityUpdate/Delete/InsertAction.

For cache access concurrency, there are 3 implementation patterns:

Read-only for the “read-only” cache usage.
Neither locks nor transactions really matter because the cache never changes once it has been loaded from the database.
Non-transaction-aware read-write for the “read-write” and “nonstrict-read-write” cache usages.
Updates to the cache occur after the database transaction has completed. Cache needs to support locks.
Transaction-aware read-write for the “transactional” cache usage.
Updates to the cache and the database are wrapped in the same JTA transaction so that the cache and database are always synchronized. Both database and cache must support JTA. Hibernate doesn’t explicitly call any cache lock function although cache transactions internally rely on cache locks.

Let’s take a database update for example. EntityUpdateAction will have the following call sequences for transaction-aware read-write, non-transaction-aware read-write for “read-write” usage and non-transaction-aware read-write for “nonstrict-read-write” usage, respectively:

Updates database in a JTA transaction; updates cache in the same transaction.
Softlocks cache; updates database in a transaction; updates cache after the previous transaction has completed successfully; otherwise releases the softlock.
A soft lock is just a special cache value invalidation representation that stops other transactions from reading or writing to the cache before it gets the new database value. Instead those transactions go to the database directly.
Cache must support locks; transaction support is not needed. If the cache is clustered, the “updates cache” call will push the new value to all replicates, which is often referred to as a “push” update policy.
Updates database in a transaction; evicts cache before the previous transaction completes; evicts cache again for safety after the previous transaction has completed successfully or not.
Neither cache lock nor cache transaction support is needed. If the cache is clustered the “evicts cache” call will invalidate all replicates, which is often referred to as a “pull” update policy.

For entity delete or insert action or collection changes, there are similar call sequences.

Actually the last two asynchronous call sequences can still guarantee database and cache consistency (basically the “read committed” isolation level) thanks in the second sequence to the softlock and the “updates cache” call being after the “updates database” call, and the pessimistic “evicts cache” call in the last call sequence.

Based on the above analysis, here are our recommendations:

Always use a “read-only” strategy if your data is read-only, such as reference data, because it is the simplest and best performing strategy and also cluster-safe.
Don’t use a “transactional” strategy unless you really want to put your cache updates and database updates in one JTA transaction, since this is usually the worst performing strategy due to the lengthy 2PC process needed by JTA.
In the authors’ opinion, the second level cache is not really a first-class datasource, and therefore using JTA can’t be justified. Actually the last two call sequences are good alternatives in most cases thanks to their data consistency guarantee.
Use the “nonstrict-read-write” strategy if your data is read-mostly or concurrent cache access and update is rare. Thanks to its light “pull” update policy it is usually the second best performing strategy.
Use the "read-write” strategy if your data is read-write. This is usually the second worst performing strategy due to its cache lock requirement and the heavy “push” update policy for clustered caches.

Example 9
Here is a sample HBM file excerpt for ISO charge type:
<class name="IsoChargeType">
   <property name="isoId" column="ISO_ID" not-null="true"/>
   <many-to-one name="estimateMethod" fetch="join" lazy="false"/>
   <many-to-one  name="allocationMethod" fetch="join" lazy="false"/>
   <many-to-one name="chargeTypeCategory" fetch="join" lazy="false"/>
</class> 
Some users only need the ISO charge type itself; some users need both the ISO charge type and some of its three associations. So the developer just eagerly loaded all three associations for simplicity. This is not uncommon if nobody is in charge of Hibernate tuning in your project.

The best approach is mentioned in Section 4.7.1. Because all the associations are read-only reference data, an alternative is use lazy fetch and turn on the second level caches on the associations to avoid the N+1 problem. Actually the former approach can also benefit from the reference data caches.

Because most projects have a lot of read-only reference data which are referenced by lots of other data, the above two approaches can improve your overall system performance a lot.

4.8.2 RegionFactory

Here are the main corresponding class/interface maps between the two approaches:

New Approach	Old Approach
RegionFactory	CacheProvider
Region	Cache
EntityRegionAccessStrategy	CacheConcurrencyStrategy
CollectionRegionAccessStrategy	CacheConcurrencyStrategy

The first improvement is that the RegionFactory builds specialized regions such as EntityRegion and TransactionRegion rather than using a general access region. The second improvement is that regions are asked to build their own access strategies for the specified cache “usage” attribute value instead of always using the 4 CacheConcurrencyStrategy implementations for all regions.

To use this new approach, you should set the factory_class instead of provider_class configuration property. Take Ehcache 2.0 for example:

    <property name="hibernate.cache.region.factory_class">
        net.sf.ehcache.hibernate.EhCacheRegionFactory  
 </property>

Other related Hibernate cache configurations remain the same as in the old approach.

The new approach is also backward compatible with the legacy approach. If you still only configure CacheProvider, the new approach will use the following self-explanatory adapters and bridges to implicitly call the old interfaces/classes:

RegionFactoryCacheProviderBridge, EntityRegionAdapter, CollectionRegionAdapter, QueryResultsRegionAdapter, EntityAccessStrategyAdapter, CollectionAccessStrategyAdapter

4.8.3 Query Cache

The second level cache can also cache your query result. This is helpful if your query is expensive or runs repeatedly.

4.9 Tuning Batch Processing

Most of Hibernate’s functions fit OLTP systems very well where each transaction usually deals with a small amount of data. However if you have a data warehouse or your transactions need to handle a lot of data, you need to think differently.

4.9.1 Non-DML-Style Using Stateful Session

This is the most natural approach if you already use the regular Session. You need to do three things:

Turn on the batch feature by configuring the following 3 properties:
```
  hibernate.jdbc.batch_size 30
  hibernate.jdbc.batch_versioned_data true
  hibernate.cache.use_second_level_cache false
```
A positive batch_size enables JDBC2’s batch update. Hibernate recommends it be from 5 to 30. Based on our testing both extremely low values and high values performed worst. As long as the value is in a reasonable range the difference is within a few seconds. This is specifically true if you have a fast network.

The true value for the second configuration setting requires your JDBC driver to return the correct row counts from executeBatch(). For Oracle users, you can’t set it to true for batch updates. Please read section “Update Counts in the Oracle Implementation of Standard Batching” in Oracle’s “JDBC Developer’s Guide and Reference” for details. Because it is still safe for batch inserts, you may have to create a separate, dedicated datasource for them.

The last configuration is optional because you can explicitly disable the second level cache on a session.

Flush and clear your first-level session cache periodically as in the following batch insert sample:

 Session session = sessionFactory.openSession();
 Transaction tx = session.beginTransaction();
 for ( int i=0; i<100000; i++ ) {
     Customer customer = new Customer(.....);
     //if your hibernate.cache.use_second_level_cache is true, call the following:
     session.setCacheMode(CacheMode.IGNORE);
     session.save(customer);
     if (i % 50 == 0) { //50, same as the JDBC batch size
     //flush a batch of inserts and release memory:
     session.flush();
     session.clear();
     }
 }
 tx.commit();
 session.close();

Batch processing usually doesn’t need data caching, otherwise you may run out of memory and dramatically increase your GC overhead. This is especially true if you have a limited amount of memory.

Always embed your batch inserts into a transaction.

A smaller number of changed objects per transaction mean more commits to your database, each of which incurs some disk-related overhead as mentioned in Section 4.5.
On the other hand, a larger number of changed objects per transaction means you lock changes for a longer time and your database also needs a bigger redo log.

4.9.2 Non-DML-Style Using Stateless Session

A stateless session performs even better than the previous approach, because it is a thin wrapper over JDBC and is able to bypass many of the operations required by the regular session. For example, it doesn’t have session cache, and nor does it interact with any second-level or query cache.
However it is not easy to use. Specifically its operation doesn’t cascade to the associated instances; you must handle them by yourself.

4.9.3 DML-Style

Using DML-style insert, update or delete, you manipulate data directly in your database instead of in Hibernate as is the case in the previous two approaches.

Because one DML-style update or delete is equivalent to many individual updates or deletes in the previous two approaches, using DML-style operations saves network overhead and it should perform better if the WHERE clause in the update or delete hits a proper database index.

It is strongly recommended that DML-style operations be used along with a stateless session. If you use a stateful session, don’t forget to clear the cache before you execute the DML otherwise Hibernate will update or evict the related cache (See the Example 10 below for details).

4.9.4 Bulk Loading

If your HQL or Criteria returns a lot of data, you should take care of two things:

Turn on the batch fetch feature with the following configuration:
```
hibernate.jdbc.fetch_size 10
```
A positive fetch_size turns on the JDBC batch fetch feature. This is more important to a slow network than to a fast one. Oracle recommends an empirical value of 10. You should test based on your environment.
Turn off caching using any of the above mentioned methods because bulk loading is usually a one-off task. Loading large amounts of data into your cache also usually means that it will be evicted quickly due to your limited memory capacity, which increases GC overhead.

Example 10
We have a background job to load a large amount of IsoDeal data by chunks for downstream processing. We also update the chunk data to an In-process status before handing over to the downstream system. The biggest chunk can have as many as half a million rows. Here is an excerpt of the original code:
Query query = session.createQuery("FROM IsoDeal d WHERE chunk-clause");
query.setLockMode("d", LockMode.UPGRADE); //for Inprocess status update
List<IsoDeal> isoDeals = query.list();
for (IsoDeal isoDeal : isoDeals) { //update status to Inprocess
   isoDeal.setStatus("Inprocess");
}
return isoDeals; 
The method to include the above lines was annotated with a Spring 2.5 declarative transaction. It took about 10 minutes for half million rows to load and update. We identified the following problems:

It ran out of memory frequently due to the session cache and second level cache.

Even when it didn't actually run out of memory, the GC overhead was just too high when memory consumption was high.

We hadn't turned on the fetch_size.

The FOR loop created too many update SQL statements even we turned on the batch_size.

Unfortunately Spring 2.5 doesn’t support Hibernate Stateless sessions, so we just turned off the second level cache; turned on fetch_size; and used a DML-style update instead of the FOR loop for improvement.

However execution time was still about 6 minutes . After turning on Hibernate’s log level to trace, we found it was the updating of session cache that caused the delay. By clearing the session cache before DML update, we cut the time to about 4 minutes, all of which is needed by the data loading into the session cache.

4.10 Tuning SQL Generations

This section will show you how to cut down the number of SQL generations.

4.10.1 N + 1 Fetching Problem

The “select fetching” strategy incurs the N + 1 problem. If the “join fetching” strategy is appropriate to you, you should always use it to completely avoid the N + 1 problem.

However if the “join fetching” strategy doesn’t perform well then, as argued in Section 4.7.2, you can use “subselect fetching”, “batch fetching”, or “lazy collection fetching” to significantly reduce the number of extra SQL statements required.

4.10.2 Insert + Update Problem

Example 11

Our ElectricityDeal has a unidirectional one-to-many association with DealCharge as shown in the following HBM file excerpt:

<class name="ElectricityDeal"
       select-before-update="true" dynamic-update="true"
       dynamic-insert="true">
    <id name="key" column="ID">
        <generator class="sequence">
            <param name="sequence">SEQ_ELECTRICITY_DEALS</param>
        </generator>
    </id>
    …
    <set name="dealCharges" cascade="all-delete-orphan">
        <key column="DEAL_KEY" not-null="false" update="true"
             on-delete="noaction"/>
        <one-to-many class="DealCharge"/>
    </set> </class>

In the “key” element, the default values for “not-null” and “update” are false and true, respectively. The above code shows them for the purpose of clarity.

If you want to create one ElectricityDeal and ten DealCharges, the following SQL statements will be generated:

1 insert on ElectricityDeal;
10 inserts on DealCharge which don’t include the foreign key column “DEAL_KEY”;
10 updates on the “DEAL_KEY” column of DealCharge.

In order to eliminate the 10 extra updates by including the “DEAL_KEY” in the 10 DealCharge inserts, you have to change “not-null” and “update” to true and false respectively.

An alternative is to use a bidirectional or many-to-one association and let DealCharge take care of the association.

4.10.3 Select Before Update

In example 11, we used “select-before-update” for ElectricityDeal, which incurs an extra select for your transient or detached object. But it does avoid unnecessary updates to your database.

You should weigh the trade off. If your object has few properties and you don’t need to prevent a database update trigger from being called unnecessarily, don’t use this feature because your limited data will cause neither too much network transfer overhead nor much database update overhead.

If your object has many properties, as for a big legacy table, you may need to turn on this feature along with “dynamic-update” to avoid too much database update overhead.

4.10.4 Cascade Delete

In example 11, if you want to delete 1 ElectricityDeal and its 100 DealCharges, Hibernate will issue 100 deletes on DealCharge.

If you change the “on-delete” attribute to “cascade”, Hibernate will not issue any deletes on DealCharge; instead the database will automatically delete the 100 DealCharges based on the ON CASCADE DELETE constraint. However you do need to ask your DBA to turn on the ON CASCADE DELETE constraint. Most DBAs are reluctant to do so in order to prevent accidental deletion of a parent object from unintentionally cascading to its dependents. Also be aware that this feature bypasses Hibernate’s usual optimistic locking strategy for versioned data.

4.10.5 Enhanced Sequence Identifier Generator

Example 11 uses an Oracle sequence as an identifier generator. Suppose we save 100 ElectricityDeals, Hibernate will issue the following SQL 100 times to retrieve the next available identifier value:

select SEQ_ELECTRICITY_DEALS.NEXTVAL from dual;

If your network is not fast, this is definitely inefficient. Release 3.2.3 and later added an enhanced generator “SequenceStyleGenerator” along with 2 optimizers: hilo and pooled. Although they are covered in Chapter 5 “Basic O/R Mapping” in HRD the coverage is limited. Both optimizers use the HiLo algorithm. The algorithm generates an identifier equal to the Hi value plus the Lo value where the Hi value represents a group number and the Lo value sequentially and repeatedly iterates from 1 to the maximum group size. The group number will increase by 1 whenever the Lo value “clocks over” to 1.

Suppose the group size is 5 (presented either by max_lo or increment_size parameter), here is an example:

Hilo optimizer
The group number comes from the database sequence’s next available value. The Hi value is defined in Hibernate as the Group Number multiplied by the increment_size parameter value.
Pooled optimizer
The Hi value comes directly from the database sequence’s next available value. Your database sequence should increment by the increment_size parameter value.

Neither optimizer will hit the database until it has exhausted its in-memory group values. The above example hits the database every 5 identifier values. With the hilo optimizer, your sequence can’t be used by other applications unless they also employ the same logic as in Hibernate. With the pooled optimizer, it is perfectly safe for other applications to use the same sequence.

Both optimizers have a shortcoming. If Hibernate crashes, some of the identifier values in the current group may still be lost. However most applications don’t require consecutive identifier values (If your database such as Oracle caches sequence values, you also lose identifier values if it crashes)

If we use the pooled optimizer in example 11, here is the new id configuration:

<id name="key" column="ID">
    <generator class="org.hibernate.id.enhance.SequenceStyleGenerator">
        <param name="sequence_name">SEQ_ELECTRICITY_DEALS</param>
        <param name="initial_value">0</param>
        <param name="increment_size">100</param>
        <param name="optimizer ">pooled</param>
    </generator>
</id>

5 Summary

This article covers most of the tuning skills you’ll find helpful for your Hibernate application tuning. It allocates more time to tuning topics that are very efficient but poorly documented, such as inheritance mapping, second level cache and enhanced sequence identifier generators.

It also mentions some database insights which are essential for tuning Hibernate.
Some examples also contain practical solutions to problems you may encounter.

Beyond this, it should be noted that Hibernate can work with In-Memory Data Grid (IMDG) such as Oracle’s Coherance or GigaSpaces IMDG that can scale your application to milliseconds level.

6 Resources

[1] Latest Hibernate Reference Documentation on jboss.com

[2] Oracle 9i Performance Tuning Guide and Reference

[3] Performance Engineering on Wikipedia

[4] Program Optimization on Wikipedia

[5] Pareto Principle (the 80/20 rule) on Wikipedia

[6] Premature Optimization on acm.org

[7] Java Performance Tuning by Jack Shirazi

[8] The Law of Leaky Abstractions by Joel Spolsky

[9] Hibernate’s StatisticsService Mbean configuration with Spring

[10] JProbe by Quest Software

[11] Java VisualVM

[12] Column-oriented DBMS on Wikipedia

[13] Apache DBCP BasicDataSource

[14] JDBC Connection Pool by Oracle

[15] Connection Failover by Oracle

[16] Last Resource Commit Optimization (LRCO)

[17] GigaSpaces for Hibernate ORM Users

About the authors

Yongjun Jiao is a technical manager at SunGard Consulting Services. He has been a professional software developer for the past 10 years. His expertise covers Java SE, Java EE, Oracle and application tuning. His recent focus has been on high performance computing including in-memory data grid, parallel and grid computing.

Stewart Clark is a principal at SunGard Consulting Services. He has been a professional software developer and project manager for the past 15 years. His expertise covers core Java, Oracle and energy trading.

InfoQ Software Architects' Newsletter

Revving Up Your Hibernate Engine

Write for InfoQ

1 Introduction

Related Sponsors

2 Hibernate Performance Tuning

3 Monitoring and Profiling

3.1.1 Monitoring SQL generations

3.1.2 Examining Hibernate Statistics

3.1.3 Profiling

4 Tuning Techniques

4.1 Tuning Business Rules and Design

4.2 Tuning Inheritance Mapping

4.2.1 Table per Class Hierarchy

4.2.2 Table per Subclass

4.2.3 Table per Concrete Class

4.2.4 Table per Concrete Class using Implicit Polymorphism

4.3 Tuning Domain Objects

4.3.1 Tuning the POJO

4.3.2 Tuning Associations between POJOs

4.4 Tuning the Connection Pool

4.5 Tuning Transactions and Concurrency

4.6 Tuning HQL

4.6.1 Tuning Index Usage

4.6.2 Binding Parameters vs. String Concatenation

4.6.3 Aggregation and Order By

4.6.4 Fetch Strategy Override

4.6.5 Native Query

4.7 Tuning Fetch Strategy

4.7.1 Fetch Strategy Override

4.7.2 N+1 Pattern or Anti-Pattern?

4.7.3 Lazy Property Fetching

4.8 Tuning Second Level Cache

4.8.1 CacheProvider-based Cache Mechanism

4.8.2 RegionFactory

4.8.3 Query Cache

4.9 Tuning Batch Processing

4.9.1 Non-DML-Style Using Stateful Session

4.9.2 Non-DML-Style Using Stateless Session

4.9.3 DML-Style

4.9.4 Bulk Loading

4.10 Tuning SQL Generations

4.10.1 N + 1 Fetching Problem

4.10.2 Insert + Update Problem

4.10.3 Select Before Update

4.10.4 Cascade Delete

4.10.5 Enhanced Sequence Identifier Generator

5 Summary

6 Resources

About the authors

Rate this Article

This content is in the Java topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter