BT

Performance Analysis and Monitoring with Perf4J

Posted by Alex Devine on Apr 23, 2009 |

It's a particular example of Murphy's Law that many developers are familiar with: after spending much time ensuring that an application is speedy and scalable in a staging environment, performance inexplicably falls off a cliff when released to production. Even worse is the case where things normally run fast, except for the boss or that important customer who complains that their operations always seem to run at a snail's pace. Detailed logging and analysis are crucial when it comes to tracking down these intermittent performance bottlenecks.

However, in today's world of service-oriented architectures and widely distributed applications, detecting the component responsible for a performance bottleneck can be exceedingly difficult. Consider what happens on the server in the relatively common scenario of a typical web 2.0-style application:

  1. An incoming web request is dispatched to the component responsible for generating the response.
  2. The request may need to be authenticated against an LDAP server.
  3. The controller component makes queries against a database.
  4. The controller component also makes calls to a third-party web service.
  5. The controller component aggregates all of the retrieved data into a set of business objects for display.
  6. The business objects are rendered and the response is sent to the user's browser.
  7. AJAX code running in the browser causes additional requests to be sent back to the server.

Answering the question "Why is my web page slow?" requires investigating multiple components and execution paths, and it requires detailed performance data from all of the application components in production.

Perf4J is an open source toolset for adding Java server-side code timing statements and for logging, analyzing and monitoring the results. For those developers who are familiar with logging frameworks such as log4j or java.util.logging, an analogy helps to describe Perf4J:

Perf4J is to System.currentTimeMillis() as log4j is to System.out.println()

How is this relevant to understanding Perf4J? Consider how most of us used to add logging statements back in the bad old days before good logging frameworks for Java were widely available. We used System.out.println() as a "poor-man's debugger", a quick and dirty way to output logging information. We quickly realized this was insufficient. We wanted logging statements to go to a dedicated log file (or possibly multiple different files), perhaps with those log files rolled on a daily basis. We wanted the ability to assign different levels of importance to different logging statements, and the option to have only certain log statements written in certain environments without changing any code, or the option to change the formatting for log statements in different environments. Thus, the rich set of features that log4j provides comes out of the original desire to have a "better" System.out.println() for logging statements.

Similarly, when new Java developers discover they need to add performance logging to code, they often do something like this:

long start = System.currentTimeMillis();
// execute the block of code to be timed
log.info("ms for block n was: " + (System.currentTimeMillis() - start));

Later on, though, these developers find they want more information, such as aggregated performance statistics like mean, minimum, maximum, standard deviation and transactions per second over a set time span. They would like graphs of these values in real time to detect problems on a running server, or they would like to expose performance metrics through JMX so that monitors can be set up to alert when performance degrades. In addition, they want their timing statements to work with the common logging frameworks like log4j.

The goal of Perf4J is to provide these features in an easily integrated (and easily extended) open source package. Perf4J was developed here at Homeaway.com, where the widely distributed nature of our infrastructure and the high availability and performance requirements of our websites require in-depth performance analysis. Some highlights of Perf4J include:

  • A simple stop watch mechanism for succinct timing statements.
  • A command line tool for generating aggregated statistics and performance graphs from raw log files.
  • Custom log4j appenders to generate statistics and graphs in a running application, with java.util.logging and logback support scheduled for subsequent releases.
  • The ability to expose performance statistics as JMX attributes, and to send notifications when statistics exceed specified thresholds.
  • A @Profiled annotation and a set of custom aspects that allow unobtrusive timing statements when coupled with an AOP framework such as AspectJ or Spring AOP.

The examples given below highlight how you can take advantage of some of these features with very little effort. For more details regarding how to integrate Perf4J into your code base, you'll want to review the Perf4J Developer's Guide.

Timing Code Blocks with a StopWatch

The org.perf4j.LoggingStopWatch class is used to add timing statements to code and print those statements to an output stream or log file:

StopWatch stopWatch = new LoggingStopWatch();
//... execute code here to be timed
stopWatch.stop("example1", "custom message text");	

The call to the stop() method records the execution time and prints a log message. By default, the base LoggingStopWatch class prints output to the System.err stream. In most cases, though, you'll want to use a subclass of LoggingStopWatch that integrates with one of the popular Java logging frameworks or logging facades, such as a Log4JStopWatch, CommonsLogStopWatch or Slf4JStopWatch. Here is some sample stop watch output:

start[1233364397765] time[499] tag[example1] message[custom message text]
start[1233364398264] time[556] tag[example1] message[custom message text]
start[1233364398820] time[698] tag[example1] message[custom message text]

Creating Statistics and Graphs Using the Log Parser

While the default stop watch output isn't a huge improvement over calling System.currentTimeMillis() directly, the real benefit comes from being able to parse this output to produce aggregated statistics and graphs. The LogParser groups stop watch output by tag and by time slice, generating detailed statistics information and, optionally, time series graphs using the Google Chart API. Here is some sample output using the default text formatting (csv formatting is also supported):

Performance Statistics   20:32:00 - 20:32:30
Tag                                                  Avg(ms)         Min         Max     Std Dev       Count
codeBlock1                                             249.4           2         487       151.3          37
codeBlock2.failure                                     782.9         612         975       130.8          17
codeBlock2.success                                     260.7           6         500       159.5          20

Performance Statistics   20:32:30 - 20:33:00
Tag                                                  Avg(ms)         Min         Max     Std Dev       Count
codeBlock1                                             244.0           7         494       150.6          41
codeBlock2.failure                                     747.9         531         943       125.3          21
codeBlock2.success                                     224.1          26         398       106.8          21

Performance Statistics   20:33:00 - 20:33:30
Tag                                                  Avg(ms)         Min         Max     Std Dev       Count
codeBlock1                                             289.3          10         464       141.1          22
codeBlock2.failure                                     781.1         599         947       135.1           8
codeBlock2.success                                     316.2         115         490       112.6          13

Graphs for average execution time and transactions per second are generated as URLs to the Google Chart Server:

Also, since by default the LogParser reads from standard input, you can pipe a log file from a running server to the LogParser to get output generated in real time:

tail -f performance.log | java -jar perf4j-0.9.8.1.jar	

Integrating Directly With Log4J

Much of Perf4J's extended functionality is made available through a set of custom log4j appenders. This allows developers to add analysis and monitoring capabilities piecemeal at deployment time through a logging framework that is familiar to many Java developers (and future versions of Perf4J will provide custom logback appenders and java.util.logging handlers). One such valuable piece of functionality allows Perf4J to expose performance data as attributes on JMX MBeans, and to send JMX notifications when performance degrades below acceptable thresholds. Since JMX has become the standard interface for managing and monitoring Java applications, exposing a Perf4J MBean opens up a wide range of functionality provided by third party monitoring applications. For example, here at Homeaway our IT department has standardized on Nagios and Cacti for system monitoring; both of these tools support querying MBeans as a data source.

The sample log4j.xml file below shows how to enable Perf4J appenders for JMX:

<?xml version="1.0" encoding="UTF-8"  ?>
	<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">


	<log4j:configuration debug="false"  xmlns:log4j="http://jakarta.apache.org/log4j/">


	    <!-- Perf4J appenders -->
	    <!--
	      This AsyncCoalescingStatisticsAppender groups  StopWatch log messages
	      into GroupedTimingStatistics messages which it  sends on the
	      file appender and perf4jJmxAppender defined  below
	    -->
	    <appender name="CoalescingStatistics"
	               class="org.perf4j.log4j.AsyncCoalescingStatisticsAppender">
	        <!--
	          The TimeSlice option  means timing logs are aggregated every 10 secs.
	        -->
	        <param name="TimeSlice"  value="10000"/>
	        <appender-ref  ref="fileAppender"/>
	        <appender-ref  ref="perf4jJmxAppender"/>
	    </appender>


	    <!--
	      This file appender is used to output aggregated  performance statistics
	      in a format identical to that produced by the LogParser.
	    -->
	    <appender name="fileAppender"  class="org.apache.log4j.FileAppender">
	        <param name="File"  value="perfStats.log"/>
	        <layout  class="org.apache.log4j.PatternLayout">
	            <param  name="ConversionPattern" value="%m%n"/>
	        </layout>
	    </appender>


	    <!--
	      This JMX appender creates an MBean and publishes  it to the platform MBean
 server by
	      default.
	    -->
	    <appender name="perf4jJmxAppender"
class="org.perf4j.log4j.JmxAttributeStatisticsAppender">
	        <!-- The tag names whose  statistics should be exposed as MBean attributes. -->
	        <param name="TagNamesToExpose"  value="firstBlock,secondBlock"/>
	        <!--
	          The  NotificationThresholds param configures the sending of JMX notifications
	          when statistic values  exceed specified thresholds. This config states that
	          the firstBlock max value  should be between 0 and 800ms, and the secondBlock max
	          value should be less  than 1500 ms. You can also set thresholds on the Min,
	          Mean, StdDev, Count and  TPS statistics - e.g. firstBlockMean(<600).
	        -->
	        <param name="NotificationThresholds"  value="firstBlockMax(0-800),secondBlockMax(<1500)"/>
	    </appender>


	    <!-- Loggers -->
	    <!-- The Perf4J logger. -->
	    <logger name="org.perf4j.TimingLogger"  additivity="false">
	        <level  value="INFO"/>
	        <appender-ref  ref="CoalescingStatistics"/>
	    </logger>


	    <!--
	      The root logger sends all log statements EXCEPT  those sent to the perf4j
	      logger to System.out.
	    -->
	    <root>
	        <level  value="INFO"/>
	        <appender  name="console" class="org.apache.log4j.ConsoleAppender">
	            <layout  class="org.apache.log4j.SimpleLayout"/>
	        </appender>
	    </root>
	</log4j:configuration>

In addition to the JMX appender, Perf4J also provides graphing appenders that generate performance graphs that can be exposed through a web front-end using a Perf4J graphing servlet. A custom csv layout allows for easy import into Excel or another spreadsheet program.

Using the @Profiled Annotation for Unobtrusive Performance Logging

One of the downsides to adding performance logging statements in code (and logging statements in general) is that it reduces the "signal-to-noise" ratio of the code, making it more difficult to follow the critical business logic of a particular code block. To help alleviate this, Perf4J provides a @Profiled annotation that can be added to methods whose execution times should be logged, allowing the method body to remain free of StopWatch code:

@Profiled(tag = "dynamicTag_{$0}")
public void profiledExample(String tagSuffix) {
    ... business logic only here
}
	

Once the @Profiled annotations have been added, a Perf4J custom timing aspect can be enabled with an aspect-oriented programming framework such as AspectJ or Spring AOP. This aspect surrounds method calls with bytecode that creates and stops a StopWatch instance. The timing aspect can even be selectively enabled on specific classes at runtime using AspectJ's load-time weaving functionality. Therefore, by using load-time weaving you incur absolutely no overhead for those methods that don't have profiling turned on.

A Simple Example: A Web-Based Prime Number Generator

This example walks you through the steps of creating a real application that demonstrates most of the features of the Perf4J library. You can run the application yourself by downloading the perf4jPrimes.war file and running it in a servlet container. Note that the war file also contains the source code used to create the application.

For simplicity, the example contains just two main code files: a primes.jsp file that is used to display the generated prime numbers and accept user-specified parameters, and the PrimeNumberGenerator class that contains the actual generation code (which, for the most part, is delegated to the java.math.BigInteger class). There are three places where Perf4J is used to time code blocks:

  1. A Log4JStopWatch is created in code to time the entire page generation in the primes.jsp file.
  2. The PrimeNumberGenerator.generatePrime() method has a Profiled annotation.
  3. The PrimeNumberGenerator.randomFromRandomDotOrg() method also has a Profiled annotation.

If you look at the WEB-INF/classes/log4j.xml file, you can see that the following Perf4J features have been enabled:

  1. All of the individual stop watch logs will be written to standard output (note that your servlet container may pipe standard output to a file on disk). If desired you can use the LogParser directly against this output.
  2. An AsyncCoalescingStatisticsAppender has been created to aggregate stop watch logs and send them to downstream GraphingStatisticsAppenders and a JmxAttributeStatisticsAppender.
  3. Two GraphingStatisticsAppenders have been created, one to output mean execution times and one to output transactions per second.

Once you start up the web app in your webserver, you can access the primes generation page at http://servername/<rootContextLocation>/primes.jsp, where the rootContextLocation is determined by your webserver congifuration (also, for easy access, you can view the PrimeNumberGenerator source code at http://servername/<rootContextLocation>/PrimeNumberGenerator.java). On the primes.jsp page you’ll see a number of different parameters for determining how to generate prime numbers. You can change the parameters and click the “Generate Prime Number” button to see how the parameters affect how long it takes to generate the number. After generating a bunch of numbers you should be able to see Perf4J output in three places:

  1. Raw log output is sent to standard output.
  2. The graphing servlet has been enabled at http://servername/<rootContextLocation>/perf4jGraphs. You should be able to see the mean execution time and transactions per second graphs by hitting this servlet.
  3. JMX monitoring has also been enabled. You can use the jconsole application that comes with the Java SDK to view the Perf4J MBean values. You will need to start your webserver with the -Dcom.sun.management.jmxremote Java command line option. Then, if you start jconsole on the same machine running the server, you should be able to see the “org.perf4j.StatisticsExposingMBean.Perf4J” data values under the “MBeans” tab.

Since at this point you haven’t enabled any TimingAspects to support the @Profiled annotations, the only stop watch output you will see is for the “fullPageGeneration” tag. To enable the TimingAspects you can use AspectJ load-time weaving. To do this you’ll need to start your webserver with the AspectJ weaving agent command line parameter like so:

-javaagent:/path/to/aspectjweaver-1.6.1.jar

You can download the jar here: http://mirrors.ibiblio.org/pub/mirrors/maven2/org/aspectj/aspectjweaver/1.6.1/aspectjweaver-1.6.1.jar

When the agent is enabled, you should then see stop watch output for the “generatePrime” and “randomFromRandomDotOrg” tag names.

Pitfalls and Best Practices

Many attempts at application monitoring, whether for performance, stability, or semantic correctness, fail to optimally deliver their intended benefits. Either too much information is gathered in a way that makes it difficult to wade through a flood of data, or not enough crucial information is gathered at the point where it is needed most. And while all monitoring requires some amount of overhead, performance monitoring should be particularly sensitive to the additional overhead it incurs. The tips below should help you get the most out of Perf4J while minimizing any unintended effects:

  1. When deciding which methods and code blocks to profile, focus on the big fish first. In enterprise applications there are a cast of “usual suspects” when it comes to performance bottlenecks: database calls, remote method calls, disk I/O, and heavy computational operations that act on large data sets. Thus, you should focus on these types of operations when deciding what to profile. Also, since these operations typically take tens to hundreds of milliseconds, the overhead that Perf4J adds to these operations is orders of magnitude smaller than the native execution time and can be ignored for practical purposes. In fact, this is one reason why Perf4J purposefully uses System.currentTimeMillis (as opposed to System.nanoTime) under the hood to time code blocks: nanosecond granularity should not be needed when timing the types of code blocks in enterprise applications that are likely to cause problems.
  2. Perf4J is designed to offload performance analysis to a separate thread or process. When using the AsyncCoalescingStatisticsAppender, the executing thread pushes a log event onto an in-memory queue that is drained by a separate thread that sends the log message to downstream appenders. Thus, even if those downstream appenders do lots of intensive work like creating graphs or updating MBean attributes or saving to a database, the effect on the code block being timed is minimal and is unrelated to the amount of work done by those downstream appenders. Similarly, when writing timing logs to a file intended to be parsed and analyzed later (for example, using the Unix tail command as described above), the effect on the process being timed is solely related to the amount of time it takes to write the log statement, not the amount of time taken by the log parser.
  3. Don’t forget the benefits of performance regression testing. In addition to detecting performance bottlenecks at runtime, Perf4J is very useful for creating performance regression tests to determine if code changes have a significant effect (either positive or negative) on performance. By creating a baseline with the initial version of your code you can quickly determine how a new version is going to impact performance.
  4. Take advantage of the @Profiled annotation and AspectJ’s load-time weaving to decide which methods should be timed at deployment time. If you use the @Profiled annotation, you can liberally add timing around method calls and then decide which methods to actually time when you deploy using AspectJ’s aop.xml configuration file. There is zero hit taken for methods that end up not being timed. While methods that are timed have a slightly higher overhead than when using StopWatches directly in code (AspectJ essentially creates a closure around the method being timed), this additional overhead should still be inconsequential when timing methods that take at least a couple of milliseconds.
  5. Don’t forget about parts of your application that execute outside of the JVM. For example, large parts of many web 2.0 application are implemented in Javascript that runs in the client’s browser. While Perf4J can be used to measure methods on the server that are executed in response to AJAX remote calls, you will need additional client-side debugging tools if the slowdown is in the Javascript execution itself.

Future Directions for Perf4J

Perf4J is undergoing active development and new features are frequently added. For example, future Perf4J versions will allow you to specify methods to be profiled using a separate config file, thus allowing you to time method executions even when you don't have access to the original source code. All features are prioritized based on user feedback and need, so try out Perf4J today if you would like to help shape its future capabilities. Better yet, as Perf4J is fully open source under the Apache 2 license, and the code is fully documented, you can extend it as desired.

Download Perf4J today and let us know what you think!

Hello stranger!

You need to Register an InfoQ account or to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Great Tool! by Doug Ly

Hi,
This is a great tool to keep track of the performance of our service layer.
My boss definitely like the statistics and how heavy the app is being used :-)
One bad thing is that the Google Chart API gets nuked when the request URI is too long!
Doug

Nice Graph by Srinivas Venkatappa

Tool looks very promising for performance analysis. Very useful tool for developers to test the performance, especially for developers practicing Agile.
I would like the tool to progress in the direction of analyzing the database calls, as most of the performance problems are seen in the database areas.

Nagios support of JMX? by Keith Doyle

Thx Alex for great overview of Perf4J.
I am curious though on the comment regarding Nagios support of JMX. My investigation shows that this is particularly difficult, unstable and 'hacky' See www.agilejava.com/blog/?p=149 (not written by me btw). Would it be possible to elaborate on the approach used to tie perf4j jmx logs to nagios that has been adopted by Homeaway?
Keith

Re: Nagios support of JMX? by Alex Victoria

Hi Keith,

The basic Nagios/Cacti usage that we have at HomeAway relies on MX4J's HTTP JMX Adaptor. The server exposes the Perf4J data as described in this article using MX4J.

We then use simple command-line bash scripts to ping the XML MX4J interface from Nagios and Cacti. From there its simple to integrate into both of these products. Thanks to Perf4J, we have performance related alerts and historical charts (above and beyond the ones provided by Per4J) in these other applications.

Hope this helps.
Alex

Perfect tool by Srinivas Reddy

We have just finished a big project and we had to add all the bioler plate code to get some numbers for DB calls and other places where there is a delay. This would have been a perfect fit for that. But i am going to get my feet wet with this in my new project (small one though).

Thanks for such a nice tool.

Re: Nagios support of JMX? by Philip Gardner

Hi Keith,

Alex is right; MX4J exposes pretty much all the data we need in an easily consumable format. By adding "&template=identity" to the end of any MX4J url everything is output into XML. We then use a number of simple Perl scripts that consume that XML and run checks against our thresholds specified in Nagios. It's simple, but it works. We do the same thing for Cacti.

I've taken a look at some of the Nagios plugins you've linked to before and basically came to the same conclusion as you, as far as their usefulness.

If you have any other questions let me know.

Philip

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

6 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT