BT

Effective Java Profiling With Open Source Tools

Posted by Joachim Haagen Skeie on Sep 28, 2011 |

Time and again, the need to take a plunge under the hood of your running application arises. The reasons driving this can range from a slow service, JVM crashes, hangs, deadlocks, frequent JVM pauses, sudden or persistent high CPU usage or even the dreaded OutOfMemoryError (OOME). The good news is that there exist many tools you can use to extract various bits of preliminary information from your running JVM, allowing you to get that “under the hood” view you need to diagnose such situations.

In this article I will be going through some of the open source tools that are available. Some of these tools come with the JVM itself, while some are third party tools. I will start out with the simplest of the tools, and gradually move on to more sophisticated tools as the article progresses. The main objective is to enable you to extend your diagnostic toolbox, something that will most definitely come in handy when you application starts to perform strange, slow or not at all.

So, let’s get started.

If your application is experiencing unusually high memory load, frequent freezes or OOME, its often most useful to start by understanding just what kind of objects are present in memory. Luckily the JVM comes out-of-the-box with the ability to do so, with the built-in tool ‘jmap’.

Jmap (with a little help from JPM)

Oracle describes jmap as an application that “prints shared object memory maps or heap memory details of a given process or core file or remote debug server”. In this article we will use jmap to print out a memory-histogram.

In order to run jmap, you need to know the PID of the application that you want to run jmap against. One easy way to obtain this is to use the JVM-provided tool jps, which will list out each and every JVM process running on your machine, along with each process’ PID. The output of jps will look like the following:

Figure 1: Terminal output of the jps command

To print out a memory histogram, we will invoke the jmap console app passing in our application’s PID and the “-histo:live” option . Without this option, jmap will make a complete dump of the application’s heap memory (which is not what we are after in this scenario). So, if we wanted to retrieve a memory histogram for the “eureka.Proxy” application from the figure above, we invoke jmap as:

jmap –histo:live 45417

The output of the above command might look something like the following:

(Click on the image to enlarge it)

Figure 2:Output of the jmap -histo:live command, showing live object counts from the heap

The resulting report shows us a row for each class type currently on the heap, with their count of allocated instances and total bytes consumed.

For this example I had let a colleague deliberately add a rather large memory leak to the application. Take specific note on the class at line 8, CelleData, compared to the snapshot below, taken just 4 minutes later:

(Click on the image to enlarge it)

Figure 3: jmap output showing the increased object count for the CelleData class

Note that the CelleData class now has become the second-largest class in the system, having added 631,701 additional instances in those 4 minutes. When we fast forward another hour or so, we observe the following:

(Click on the image to enlarge it)

Figure 4: jmap output an hour after application startup, showing over 25 million instances of the CelleData class

There are now more than 25 million instances of the CelleData class, occupying over 1GB of memory! We’ve identified a leak.

What’s nice about this data is that it’s not only useful, but also very quick to get hold of, even for very large JVM heaps. I’ve tried it with a busy application using 17GB of Heap memory and jmap was able to produced its histogram in about one minute.

Note that jmap isn’t a profiling tool, and that the JVM might halt during the histogram generation, so be sure its acceptable for your application to pause for the time it takes to generate the histogram. In my experience though, generating a histogram is something you do more often when debugging a serious bug, and so a few one minute pauses of the application is likely acceptable during such circumstances. This brings me nicely along to the next topic – the semi-automatic profiling tool VisualVM.

VisualVM

Another tool currently built into the JVM is VisualVM, described by its creators as “a visual tool integrating several command line JDK tools and lightweight profiling capabilities”. As such, VisualVM is another tool that you are most likely to fire up post-mortem, or at least after a serious error or performance issue has been identified through more traditional means (customer complaints being the most prominent in this category).

We’ll continue with the previous example application and its serious memory leak problem. After about 30 minutes of operation VisualVM has built up the following chart:

Figure 5: VisualVM memory chart of the initial application run

With this graph we can clearly see that by 7:00pm, after only 10 minutes of operation, the application is already consuming over 1GB of Heap space. After another 23 minutes the JVM had reached the ceiling of its –Xmx3g threshold, resulting in a very slow application, a slow system (constant Garbage Collection) and an incredible abundance of OOME.

Having identified the cause of this climb with jmap, then fixing it, we give the application another run under the strict supervision of VisualVM and observe the following:

Figure 6: VisualVM memory chart after fixing the memory leak

As you can see, the memory curve of the application (still having been launched with –Xmx3g) looks much, much better.

In addition to the memory graphing tool, VisualVM also provides a sampler and a lightweight profiler.

The VisualVM Sampler lets you sample your application periodically for CPU and Memory usage. It’s possible to get statistics similar to those available through jmap, with the additional capability to sample your method calls’ CPU usage. This enables you to get a quick overview of your methods execution times, sampled at regular intervals:

(Click on the image to enlarge it)

Figure 7: VisualVM method execution time table

The VisualVM Profiler will give you the same information as the sampler, but rather than sampling your application for information at regular intervals, it can gather statistics from your application throughout normal application execution (via byte-code instrumentation of your application’s source). The statistics you’ll get from this profiler will be more accurate and more frequently updated than the data gathered by the sampler.

(Click on the image to enlarge it)

Figure 8: Output from the VisualVM Profiler

The flipside you must consider though is that the profiler is a “brute-force” profiler of sorts. It’s instrumentation approach will essentially redefine most of the classes and methods that your application is executing, and as a result is likely to slow down your application significantly. For example, running part of a normal analysis with the application used above, the application finished in about 35 seconds. Turning on VisualVMs memory profiler caused the application to finish the same analysis in 31 minutes.

Understand that VisualVM is not a full-featured profiler, as its not capable of constantly running against your production JVM. It won’t persist its data, nor will it be able to specify thresholds and send alerts when these thresholds are breached. To be able to get closer to the goal of a full-featured profiler, let’s look next at BTrace, a full-featured open source Java agent.

BTrace

Imagine being able to tell your live production JVM exactly what type of information you want to gather (and as result what to ignore). What would that list look like? I suppose the answer to that question will differ from person to person, and from scenario to scenario. Personally, I am frequently most interested in the following statistics:

  • The Memory usage for the applications Heap, Non Heap, Permanent Generation and the different memory pools that the JVM has (new generation, tenured generation, survivor space, etc.)
  • The number of threads that are currently active within your application, along with a drilldown of what type of threads that are in use (with individual counts)
  • The JVMs CPU load
  • The Systems load average / Total CPU usage of the system
  • For certain classes and methods in my application I want to see the invocation counts, the average self execution time and the average wall clock time.
  • The invocation counts and the execution times for SQL calls
  • The invocation counts and the execution times for disk and network operations

BTrace can gather all of the above information, and it lets you specify just what you want to gather using BTrace Scripts. The nice thing about BTrace script is that it is just a normal Java class containing some special annotations to specify just where and how BTrace will instrument your application. BTrace Scripts are compiled into standard .class files by the BTrace Compiler - btracec.

A BTrace script consists of multiple parts, as shown in the diagram below. For a more thorough walkthrough of the script shown below, please visit this link, as well as the BTrace project’s website.

As BTrace is only an agent, its job is finished when it has logged its results. BTrace doesn’t have any functionality to dynamically present you with the information it has gathered in any other form than a textual output. By default the output of a BTrace script will end up in a text file alongside the btrace .class file, called YourBTraceScriptName.class.btrace.

Its possible to pass in an extra parameter to BTrace to make it log-rotate its log files at specified intervals. Keep in mind that it will only log rotate across 100 files, after reaching *.class.btrace.99 it will overwrite the *.class.btrace.00 file. If you keep your rotation intervals to a sensible amount (say, every 7.5 seconds) that should give you plenty of time to process the output. To enable this log rotation, add the fileRollMilliseconds=7500 parameter to the java agents input parameters.

A big downside of BTrace is its primitive, difficult to navigate output format. You’ll really want a better way to process the BTrace output and data, preferably in a consistent graphical user interface. You’ll also benefit from the ability to compare data from different points in time, as well as the ability to send alerts when thresholds are breached. This is where the new open source tool EurekaJ comes in.

(Click on the image to enlarge it)

Figure 9: The required parts of a BTrace script to enable method profiling

EurekaJ

I started writing EurekaJ back in 2008. At the time, I was looking for an open source tool that had the functionality that I was looking for in a Profiler but couldn’t find it so I began writing my own. During the development process I looked into a range of different technologies and have visited a number of architectural models before I ending up with what is the first release of EurekaJ. You can read more about the history of EurekaJ, have a look at its source code or download and try your own copy yourself from the project’s website.

EurekaJ generally provides two main applications:

  1. a java-based “Manager” application which will accept incoming statistics and provide a view to visualize the statistics in a consistent manner, and
  2. a proxy application which will parse the BTrace output, convert it to JSON and forward it to the EurekaJ Manager application’s REST interface.

EurekaJ accepts two data-formats for its input data. EurekaJ Proxy expects the BTrace script output to be formatted as a comma-separated file (which is easy to get hold of from BTrace), while EurekaJ Manager expects its input to adhere to its JSON REST interface format. This means that you are able to communicate and pass in metrics either via the Proxy application, or directly via the REST interface.

With EurekaJ Manger, it is possible to group multiple statistics into a single chart, to define thresholds, and email alert receivers. The data collected can be viewed in real time, as well as historical.

All collected data is sorted into a logical tree structure, a structure that the author of the BTrace scripts decides upon. Its advisable that the BTrace script author consider grouping related statistics to make them more accessible and discoverable later on in EurekaJ. For instance, I personally like to group the statistics in logical groups, as below.

Figure 10: Example of statistics groups from the EurekaJ demonstration application

Chart Examples

One important metric to gather is the average load on the system where your application is executing. If you are experiencing a slow application, the fault might not actually lie within that application, but can be hidden elsewhere on the machine that hosts the application. I have tried to debug periodically slow applications where the real culprit has shown to be the virus scanner. These things can be hard to pinpoint if you don’t have any metrics to go by. Consider being able to have access to a chart that displays both the average system load, as well as the load your process is generating on your system. Below is a chart that displays the average load over 2 hours for the Amazon EC2 virtual server that is currently running the EurekaJ Demonstration Application (You can log into this application using the username and password ‘user’).

(Click on the image to enlarge it)

Figure 11: EurekaJ chart showing the average system load

The yellow and red line on the chart above represents the alert thresholds. Once the chart creeps above the yellow line for at least the minimum amount of time as specified by the alert, the metric reaches the “warning” state. Similarly, when the red line is breached, the metric reaches its “critical” or “error” state. Each time an alert transition between states, EurekaJ will fire off an email to the registered recipients.

In the scenario above it seems that there is a periodic event happening every 20 minutes that cause a spike in the load average chart. The first thing that you might want to ensure is that this spike is indeed generated by your application, and not from somewhere else. We can verify this by measuring the process CPU load as well. EurekaJ will allow you quickly visualize this by selecting both metrics from the tree menu, where EurekaJ will display both charts, one underneath the other.

(Click on the image to enlarge it)

Figure 12: Showing multiple charts at once

In the example above, its clear that there is a meaningful correlation between the process’ CPU usage and the overall system load.

Many applications require that someone be alerted as soon as the application becomes unresponsive or unavailable. The chart shown below is an example taken from an installation of Confluence, the enterprise Wiki software from Atlassian. In this example the applications memory usage increased rapidly, to a point where the application simply ran out of memory. At that point, Confluence was unable to process incoming requests and the log was filled with all kinds of strange errors.

You might expect the application to throw an OOME immediately after the application ran out of memory, however the JVM in fact doesn’t throw an OOME until the JVM determines that Garbage Collection is slow. As a result the application did not crash completely, and Java did not report an OutOfMemoryError for another two hours, and even after two hours, the application was still “running”( meaning that the JVM process was still active). Obviously, any process-monitoring tool won’t identify the application as being “down”.

(Click on the image to enlarge it)

Figure 13: EurekaJ Heap chart showing one possible scenario for Out Of Memory Error

Focusing in on the last couple of hours of operation, the chart reveals the following metrics.

(Click on the image to enlarge it)

Figure 14: Zooming in on the previous chart

With EurekaJ its possible, and advisable, to define alerts on an applications heap memory. If an application consistently uses more than 95% - 98% (depending on your overall heap size), your application most definitely has an issue with memory and you either have to allocate more heap to the application using the –Xmx parameter, or otherwise figure out how to code your application to use less memory. Also, the ability to alert upon missing statistics is planned for a future release of EurekaJ.

The last chart example displays a chart group showing the memory usage of four different applications. This type of chart grouping makes it easy to compare data from different parts of an application, or even across applications and servers. The four applications charted below all have very different memory needs, and different memory usage patterns.

As the chart below shows, different applications have different memory curves. These curves are very dependent on a number of factors ranging from which frameworks are used, the amount of caching utilized, the number of users, and load on the application etc. I have included the chart below to illustrate how important it is for you to know how your application behaves both during normal and high loads, as this will very much influence how you define your alert thresholds.

(Click on the image to enlarge it)

Figure 15: EurekaJ Chart Groups to overlay charts on top of each other

Mind the performance hits – keep your cold spots cold!

Each measurement that you gather is likely to incur a performance hit on your system in some way. Some metrics can be considered “free” (or “ignorably cheap”), while others can be quite expensive to gather. It is very important that you know what you are asking of BTrace, so that you will minimize the performance hit profiling will cause your application. Consider the following three principles regarding this:

  • “Sample”-based measurements can generally be considered “free”. If you sample your CPU load, process CPU load, Memory Usage and Thread count once every 5-10 seconds, the added millisecond or two incurred is negligible. In my opinion, you should always gather statistics for these types of metrics, being they cost you nothing.
  • Measurements of long-running tasks can also be considered “free”. Generally, you will incur 1700-2500 nanoseconds for each method being profiled. If you are measuring the execution times of SQL-queries, network traffic, disk access or the processing being done in a Servlet where one might expect a performance ranging from 40 milliseconds (disc access) to up to a second (servlet processing), adding about 2500 nanoseconds to each of these methods is also negligible.
  • Never add measurements for methods that are executed within a loop

As a general rule you want to find your application’s hot spots, while being mindful to keep your cold spots cold. For example, consider the following class:

(Click on the image to enlarge it)

Figure 16: Example of a simple Data Access Object class that we might want to profile

Depending on the execution time of the SQL defined on line 5, the readStatFromDatabase method could be a hot-spot. It will become a hot-spot if the query returns a significant amount of rows, having a negative effect both at line 13 (network traffic between the application and the database server) and at lines 14-16 (the processing required for each row in the resultset). The method will also become a hot spot (at line 13) if it takes the database a long time to return the results

The method buildNewStat however will likely never become a hot spot, in and of itself. Even though it might be executed many times, it will complete in just a couple of nanoseconds for each invocation. On the other hand, adding a 2500 nanosecond metric-gathering hit for each invocation will most definitely make this method look like a hot spot, whenever the SQL is executed. Therefore we want to avoid measuring it.

(Click on the image to enlarge it)

Figure 17: Showing which parts of the above class to profile and which to avoid

Setting it all up

A complete Profiling setup using EurekaJ has the following main parts:

  • The application you wish to monitor/instrument
  • BTrace – the Java agent
  • BTrace scripts that tell BTrace what to measure
  • A filesystem where you can store your BTrace output
  • EurekaJ Proxy to transfer the BTrace output to EurekaJ Manager
  • An installation of EurekaJ Manager (either locally on your LAN, or remotely, accessed over the internet)

(Click on the image to enlarge it)

Figure 18: An overview of one possible architecture that allows application profiling using the tools described in this article

For full installation guides for each of these products, please visit the EurekaJ project website.

Summary

Throughout this article we’ve gone over the strength of some of the open source tools that are available for use both when you need to perform a deep-dive analysis into a running JVM, as well as when the time comes to employ a wider and continuous surveillance of your development/test/production application deployments using a profiler tool.

Hopefully you have started to see the benefits of continuously gathering metrics, and the ability to be alerted whenever your defined thresholds that you set for your application is breached.

Thank you!

References

[1] jmap documentation

[2] BTrace Script Concepts

[3] BTrace project site

[4] EurekaJ Live Demonstration

[5] EurekaJ Project website 

About the Author

Joachim Haagen Skeie is employed as a Senior Consultant within Java and Web technologies at Kantega AS in Oslo, Norway, with a keen interest in both application profiling and open source software. He can be contacted at his Twitter account.

 

 

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

DTrace by Ryan Eccles

DTrace also has some support that can be enabled in the JDK if your OS supports it (Solaris, OSX).

Eclipse Memory Analyzer? by Chris Aniszczyk

No mention of MAT?

It's a beast when dealing with large heaps... eclipse.org/mat

Re: DTrace by Joachim Haagen Skeie

Hello Ryan,

DTrace is a very good tool, but sadly its not available in all Linux distributions. BTrace does have its own DTrace integration, which lets you execute DTrace scripts via BTrace.

A sample of using DTrace from BTrace can be found on the BTrace project site: kenai.com/projects/btrace/sources/hg/content/sa...

Very Best Regards,
Joachim Haagen Skeie

Re: Eclipse Memory Analyzer? by Joachim Haagen Skeie

Hello Chris,

You are right. MAT is also a very good tool to perform a post-morten analysis of a heap dump. I've used it successfully on one occation. I didn't, however, feel I had enough experience with MAT to do a writeup about its features. Also, I'm sure there are other tools that I have left out as well :)

Very Best Regards,
Joachim Haagen Skeie

Re: DTrace by William Louth

DTrace is not at all suitable for Java performance profiling even when it is doing effectively nothing. Just enabling the probes will impact performance before even registering a script/listener.

If you’re not metering you’re not trying hard enough to be the best – Part 2 of 3
opencore.jinspired.com/?p=5459

With JXInsight/OpenCore available for FREE as a development only license (30-40 mins per process exec) it beats me why anyone would ever consider such a solution or any other one for that matter.

DTrace does have a place but as a counter/meter provider for resulting C code executions.

OpenCore Metering Runtime Actors
opencore.jinspired.com/?p=1888

The near zero overhead claims for DTrace outside of certain provider impls is a myth at best. Most of the time the lang/runtime integrations using static providers have already factored this cost into the code base itself giving DTrace a somewhat easy ride.

Re: DTrace by William Louth

The problem with BTrace is that is tries too hard to be like DTrace whose adhoc approach and anemic domain/measurement model I consider dated at least in terms of application/activity behavioral tracking and resource consumption/costing metering.

Re: DTrace by Ryan Eccles

William,

The Java probes due tend to suck, but to say DTrace as a whole is never suitable for any profiling I can't agree with. I'm in the financial industry where we operate at sub-millisecond latencies (and not C, this is Java). We use DTrace all of the time to analyze the interaction between thread schedulers, the TCP stack, and execution path. It's been invaluable at figuring out how Java interacts with the OS. As for the *nix issues, yes this is a problem. I believe Gregg et. al. are working at porting the tool to other distros.

As for the overhead, the insight we have gained using DTrace have matched the behavior we are seeing in production. I agree that as a tool it's tricker to understand then some other tools. But your statements don't match the evidence I see every day.


-Ryan

Re: DTrace by William Louth

Ryan one of the links I referenced did call out the value in DTrace but not at the activity/context level but as a source of low level sensors (signals) which using activity based metering could be associated back in the thread/call/chain/path accurately.

opencore.jinspired.com/?p=1888

DTrace is adhoc and not terribly intelligent outside of its instrumentation focus. The real art in performance analysis and management is in using the data collected to make more informed decisions on what to continue measuring and collecting and in this regard DTrace has very little to offer - no model, no reuse, no....

opencore.jinspired.com/?p=3887

Re: DTrace by Prakash Kash

Hi Sir,
Let me know DTrace is available for Ubuntu version???
Because im trying to get the performance in ubuntu but i can't..

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

9 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT