BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Profiling java.util.concurrent locks

Profiling java.util.concurrent locks

Bookmarks

IBM’s Yao Qi, Raja Das, and Zhi Da Luo describe jucprofiler, an alphaWorks tool designed to profile multicore applications that make use of the java.util.concurrent classes introduced in Java 5.

1. Introduction

Performance analysis is an important aspect of the application development process. It is typically done by a specialist whose main goal is to improve the code performance on a given platform. This problem becomes even more difficult when dealing with concurrent/multi-threaded applications running on multicore platforms. In such cases, one not only has to worry about code performance but also about the scalability of the code.

With the introduction of the java.util.concurrent (JUC) package in Java 5, a new type of lock was introduced into the Java language. Usage of the JUC package is becoming more and more popular as more applications are either developed or fine tuned to run better on multicore systems. However whilst detailed contention information is available through the JLM for conventional Java locks, no equivalent exists for the java.util.concurrent.locks package; no tool is supplied by Sun/Oracle with the JDK, and neither IBM nor other Java vendors have one. This absence of a JUC lock profiling tool is the motivation behind the development of our lock tool, jucprofiler, as a part of Multicore SDK.

2. Short Overview of jucprofiler

When using a JUC lock, a thread will “stop” execution in the following two cases:

  1. When a thread A tries to acquire a JUC lock, while this lock has been acquired by another thread, then thread A has to “stop” its execution and wait until this lock is released or times out.
  2. When a thread A invokes one of the “wait” APIs of java.util.concurrent.locks.Condition, thread A “stops” its execution until the other thread notifies it or times out.

We identify these two cases as Contention Time and Wait Time respectively.

Jucprofiler is designed and implemented to capture the time usage of these two cases.

2.1 Instrumentation

In order to capture the JUC lock runtime data, several JUC classes are instrumented offline, and replaced with new ones in the JRE. Before jucprofiler is used for the first time, users have to run a command to generate PreInstrument.jar. This step need be done only once, assuming the JRE is not changed. (If users change to another JRE, they have to remove PreInstrument.jar, and re-run this command to generate PreInstrument.jar again.)

2.1.1. Contention Time

For contention time, jucprofiler records the allocation of java.util.concurrent.locks.AbstractQueuedSynchronizer and java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject, and assigns unique ids to them.

Class

Methods

Call Site

java.util.concurrent.locks.LockSupport

park (Object);

parkAndCheckInterrupt() in class AbstractQueuedSynchronizer

 

parkNanos(Object blocker, long nanos)

doAcquireNanos(int arg, long nanosTimeout)

doAcquireSharedNanos(int arg, long nanosTimeout) in class AbstractQueuedSynchronizer

2.1.2. Wait Time

For the time usage on a lock, jucprofiler captures the time usage of invoking park(blocker) and parkNanos(blocker, nanos) in class java.util.concurrent.locks.LockSupport, when these two methods are invoked in different places:

Class

Methods

Call Site

java.util.concurrent.locks.LockSupport

park (Object);

Other methods except parkAndCheckInterrupt() in class AbstractQueuedSynchronizer

 

parkNanos(Object blocker, long nanos)

Other methods except doAcquireNanos(int arg, long nanosTimeout)

doAcquireSharedNanos(int arg, long nanosTimeout) in class AbstractQueuedSynchronizer

3. How to use the Tool

In this section we use a real-world application to explore how the tool can be used to find problems in the code.

3.1. Running jucprofiler

The following command is used:

$ java -Xbootclasspath/p:$JUCP/BCIRuntime.jar:$JUCP/PreInstrument.jar
 -javaagent:$JUCP/BCIAgent.jar=callStackDepth=10:allocationStackDepth=0 
:,msdk.idp=com.ibm.msdk.bciagent.JUCInstrumentDecisionProvider,:libPath=$JUCP:traceJUC=on
 -cp .:derby.jar JavaDBDemo 

jucprofiler can be used for any java program on JDK 6. Assuming the jucprofiler is installed in directory $JUCP. After jucprofiler finishes running your program, a trace file named "BCIAgent.***.bci.trace" will be generated, where "***" is a unique time stamp for this execution.

3.2. Getting the Result

Run the command shown below to get the Juc profiling result.

$ java -Xmx1000m -jar $JUCP/BCITraceReader.jar {tracefile} {resultOutputFile} 

Where:

  1. {tracefile} is the full path of the file or directory containing trace files, such as BCIAgent.***.bci.trace.
  2. {resultOutputFile} is an optional parameter to set the file in which to store the analysis results. If this option is omitted, the analysis results will be printed to the console.

Note: Since the post analysis process to trace file may suffer from some memory overhead, it's better to increase process heap size via the -Xmx Java option. In our experiment, analyzing a 160M trace may consume 800M memory.

3.3. Understanding the Profiling Result

As shown in the figure below, the plain text output includes different kinds of information, such as lock name, lock contention count and time, lock hold time and count, lock allocation stack trace, duration and stack trace of each lock contention, etc. The result can help users find out the JUC lock contention bottleneck in the program.

Before the “LEGEND” section, the profiling result report first summarizes all JUC lock contentions in the program, in descending order by lock contention count, then contention time. Each summary row item belongs to one of two different types, “AQS” for individual JUC locks and “CHM” for ConcurrentHashMap. Since a ConcurrentHashMap is divided into several Segment(s) for element storage and each Segment is protected by a different JUC lock, a ConcurrentHashMap can be viewed as a collection of JUC locks from a lock perspective. For example “CHM@8” below has a contention count of 276 and a contention time of 3,945,7000, meaning that the total contention count of all segment locks in “CHM@8” is 276, and the total contention time of all segment locks in “CHM@8” is 3,945,7000 nanoseconds. This grouping of locks helps programmers identity which ConcurrentHashMap object has the most serious JUC lock contentions. In contrast JUC lock “AQS@1790” doesn’t belong to any ConcurrentHashMap object and this lock is used explicitly in program.

Note: Because lock hold events are not enabled in the example trace, 0 is put in columns HOLD-COUNT and HOLD-TIME.

After the “LEGEND” section, the profiling result reports details of each JUC lock contention. In the result snippet below, for ConcurrentHashMap “CHM@8” lock contention happens in two segment locks (Lock [AQS@135]” and “Lock [AQS@146]). For “Lock [AQS@135]”, it contends in one place, showing contention count, contention time, and full stack back trace of the contention. So does “Lock [AQS@146]”. These details help programmers to locate the lock contention in the program and clearly understand which segments of ConcurrentHashMap contend most.

Multicore Software Development Toolkit Version_2.1

j.u.c Lock Profiler Report

                NAME    CONTD-COUNT          CONTD-TIME     HOLD-COUNT           HOLD-TIME
               CHM@8            276            39457000              0                   0
            AQS@1790             36             4029000              0                   0
             AQS@131             17              630000              0                   0
=================================================================================================
 LEGEND:
                NAME : Name of juc lock(AQS) or ConcurrentHashMap(CHM), format: <Type>@<Idgt;
         CONTD-COUNT : Total count of lock contention
          CONTD-TIME : Total time of lock contention in nanosecond
          HOLD-COUNT : Total count of lock hold
           HOLD-TIME : Total time of lock hold in nanosecond
==================================================================================================

ConcurrentHashMap [CHM@8]:

-----------------------------------------------------------------------------------------------------------
Lock [AQS@135]:

-----------------------------------------------------------------------------------------------------------
Lock Contention 1
CONTD-COUNT: 25
CONTD-TIME: 10827000
Call Stack:
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:758)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:789)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1125)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:197)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:273)
java.util.concurrent.ConcurrentHashMap$Segment.remove(ConcurrentHashMap.java:530)
java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:934)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlock(ConcurrentLockSet.java:740)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlockReference(ConcurrentLockSet.java:784)
org.apache.derby.impl.services.locks.LockSpace.unlockReference(LockSpace.java:275)


Lock [AQS@146]:

-----------------------------------------------------------------------------------------------------------
Lock Contention 1
CONTD-COUNT: 22
CONTD-TIME: 2009000
Call Stack:
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:758)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:789)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1125)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:197)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:273)
java.util.concurrent.ConcurrentHashMap$Segment.remove(ConcurrentHashMap.java:530)
java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:934)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlock(ConcurrentLockSet.java:740)
org.apache.derby.impl.services.locks.ConcurrentLockSet.unlockReference(ConcurrentLockSet.java:784)
org.apache.derby.impl.services.locks.LockSpace.unlockReference(LockSpace.java:275)

3.4. Opening the Trace File in Visual Analyzer

There are some views developed in Eclipse and supplied as part of the multicore SDK, referred to as Visual Analyzer, which show jucprofiler trace files with tables and figures,, Currently, there are two views for jucprofiler, one is referred to as the “J.U.C statistics” view, and the other is “J.U.C synchronization view”.

The “J.U.C statistics” view is shown as below. The two columns on the right-hand side are “Contention Times” and “Contention Counts”. The “Allocation Stack” column is about the allocation call site of JUC locks.

The “J.U.C synchronization” view is shown as below. The first lane is Time, indicating when this lock contention occurs. The second lane is Thread, indicating in which thread lock contention occurs. The third lane is Monitor, indicating which JUC lock is contented. The last lane is Method, indicating where the lock contends.

3.5. Online Control

During execution, jucprofiler will create a ControlServer listening on port 2009. Users can use ControlClient to connect to that port and control the behavior of jucprofiler, e.g. the trace can be turned on and off on-the-fly:

$ java -cp BCIRuntime.jar 
com.ibm.msdk.bciruntime.control.ControlClient HOST -m [b|i] –b 
START -e END

Where:

HOST: The host name that ControlClient wants to connect to, defaulted to localhost.

-m [b|i]: Mode of ControlClient - b means batch mode, while i means interactive mode. The default is interactive mode.

-b START: if mode is set to batch mode, START is the time(seconds) you want to start profiling.

-e END: END is the duration time(seconds) you want to cease profiling.

3.5.1. Interactive Mode

A simple shell is provided, and the user can type the commands juc.on and juc.off to turn on and off jucprofiler. For example, java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient, ControlClient will connect to localhost, and open a shell to control jucprofiler.

$ java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient
jucprofiler control> juc.on
juc.on
jucprofiler t control> start
start
jucprofiler control> stop
stop
jucprofiler control> juc.off
juc.off
jmit control> quit
quit
$ java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient localhost -m b -b 2 -e 10
Start tracing in 2 seconds
Start tracing
Stop tracing in 10 seconds
Stop tracing
quit

3.5.2. Batch Mode

The jucprofiler also supports a batch mode. For example, java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient mtrat-test.dyndns.org -m b -b 10 -e 10, means ControlClient will connect to machine mtrat-test.dyndns.org, start profiler after 10 seconds, and cease profiling 10 seconds later.

4. Conclusion

With the popularity of multicores, more and more concurrent/multi-threaded Java applications will be developed. We need better tools for profiling such concurrent applications. The jucprofiler described in this article fulfills one of the key gaps in Java profiling tools.

About the Authors

Yao Qi is a staff software engineer at IBM's China Emerging Technology Institute, China Development Lab. He joined IBM in 2006. Mr. Qi has experience in JVM internals, bytecode instrumentation, and Java programming . He is currently working on a run-time analysis tool for Java. Mr. Qi holds a Masters degree in computer science from Beijing Institute of Technology, China.

Raja Das is a software architect in IBM's Software Group. Currently, he is developing libraries and frameworks for multicore/manycore systems. Previously, he was the WebSphere® Partner Gateway product architect. Mr. Das's interests include programming languages, parallel software, and systems.

Zhi Da Luo is a software engineer at IBM’s China Emerging Technology Institute, China Development Lab. Mr. Luo joined IBM in 2008. He has experience in embedded software development and is currently working on a run-time analysis tool. Mr. Luo holds a Masters degree in software engineering from Peking University in Beijing, China.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Does it monitor the key methods

    by Ma Mercy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Does this framework monitor the key methods by cut-cross or bytecode parse, and Does not want to modify the JVM?

    java.util.concurrent.locks.AbstractQueuedSynchronizer provides the programing approach to implement thread wait queue.

  • Re: Does it monitor the key methods

    by Qi Yao,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Mercy,
    It is done by byte code instrumentation. We don't modify JVM.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT