Book Review and Interview: Java Performance, by Charlie Hunt and Binu John
Java Performance, by Charlie Hunt and Binu John, provides performance tuning advice for both Java SE and EE applications. Specifically, it provides information on performance monitoring, profiling, tuning HotSpot, and Java EE application performance tuning (the latter section was written by Binu John). It is suitable as a guide for developers new to the topic who want to get a better understanding of what is going on under the hood, and also provides excellent reference material for performance specialists.
Whilst there are a number of other books that cover similar ground, this is the first new one on the topic for some time, and as such is able to cover some of the state-of-the-art Java performance tuning techniques that weren't available when, for example, Steve Wilson et al wrote "Java Platform Performance". As well as reflecting changes in the underlying technology, and available optimisation techniques, the book reflects changes in common software engineering practices, for example advocating the inclusion of performance testing as part of the continuous build cycle, and involving project stakeholders in setting performance tuning metrics early.
The book is particularly good on low level details, with the myriad of different HotSpot command line options well represented, and an excellent, step-by-step guide to JVM tuning. The two chapters covering profiling (one focused on using Oracle's Solaris Studio Performance Analyzer and the NetBeans Profiler, and the other on resolving common issues), are also first-rate, and I was particularly pleased to find an appendix with source code examples for common, but hard to resolve, problems such as lock contention, resizing Java collections, and increasing parallelism. In addition, the section on writing effective benchmarks, and the sort of issues that a smart JIT compiler can cause, is very strong, showing you how you can take a look at the generated assembly code from the JIT compiler to make sure your microbenchmark is doing what you think it is doing.
As well as detailed information on the common top-down (i.e. application centric) approach to tuning, the book provides plenty of information on bottom-up (hardware/OS centric) approaches, with solid coverage of Windows, Linux and Solaris. For example, the book talks about how to reduce TLB misses (perhaps the most common form of CPU cache miss) and how to reduce involuntary context switches. For reducing other types of CPU stalls, there is a side-bar in Chapter 5, Java Application Profiling, that talks about the -h option for Oracle Solaris Studio Performance Analyzer, which can be used to incorporate hardware counter information in the profile, such as CPU cache misses. The same side-bar also talks about how alternative implementations can impact how and when access to a Java object field can reduce CPU cache misses. It also, though, states that this type of performance tuning activity is generally recommended for compute bound applications only, and the book itself, not unreasonable, stops short of providing full details on this approach. "We actually find it quite interesting that folks think it's necessary to have to tune an operating system," Charlie & Binu told us. "There are some that rarely require tuning. While it may be considered by many to be 'fun' to tune an operating system, we advocate not tuning an operating system unless you have evidence, via the data you've collected, that suggests you need to."
It should perhaps be noted that whilst the book provides a lot of useful, generally applicable, information, is does focus a great deal on Oracle's tools and hardware. For instance, a section in chapter 1 entitled, “Choosing the Right CPU architecture” focuses almost entirely on Oracle's SPARC chips, rather than commodity hardware. The intention in this section is to demonstrate that the common approach of running a subset of a production load as a means to evaluate a systems performance has flaws, but that perhaps doesn't come over as clearly as it should. Elsewhere the two profilers featured are the (admittedly excellent) Oracle Solaris Studio Performance Analyzer tool, and the NetBeans profiler. This may be partially due to space constraints (at 550 pages plus appendices it is already quite a lengthy book), but it would have been nice to see an alternative, such as Intel's VTune, given more coverage. I would also have liked to have seen some discussion around alternative garbage collectors, such as Oracle's JRockit, IBM's Balanced GC (which to be fair may have appeared to late for the book's production schedule) or Azul's C4, which don't get a mention. In the Java EE section GlassFish, the Java EE reference implementation, is used for all the examples. Of course much of the advice here applies equally well to other containers, but it would have been good to see more reference made to them in the text.
One other nitpick: the lack of colour plates within the text is occasionally problematic. For example the use of monochrome screen shots throughout the text reduces their usefulness, notably in chapter 2 (Operating System Performance Monitoring) where lack of colour makes reading the graphs more difficult than it should be. This seemed slightly disappointing given the cover price of $59.99 (US).
Despite these criticisms, however, the book is an excellent, if Oracle-centric, guide to the subject. It doesn't provide a recipe for solving every problem, but does provide enough information for non-performance specialist developers, and others involved in application performance tuning work, to solve the majority of commonly encountered performance problems.
Given that the book advocates a slightly different approach to many others on the topic - recommending a use case, rather than code-centric approach to tuning, and also suggesting that software professionals think about performance tuning much earlier in the development cycle than is commonly done - InfoQ spoke to Charlie Hunt and Binu John to find about more about their thinking here.
InfoQ: What prompted you to write the book?
Charlie & Binu: We realized that much of the information available on Java performance was out of date, or quickly becoming out of date. In addition, we recognized that there was a lot of interest in knowing more about the Java HotSpot VM internals along with some structure around how to go about tuning it. Also, we noticed there were folks who were using combinations of Java HotSpot VM command line options that just didn't make sense. There also didn't seem to be material that pulled together the combination of JVM, Java SE and Java EE performance information into one volume. And, mostly, we wanted to offer what we have learned over the years doing Java performance. Our hope is that readers will find at least one thing in the book that helps them.
InfoQ: In general the methodology you describe, at least in terms of the language you use, seemed to me to be a closer fit for an RUP approach to software development than, say, one of the agile approaches. Is that a valid comment, or do you think the approach you describe should adapt well to any software development approach?
In our opinion, we think the methodology can be adapted to agile approaches and in general most software development approaches. The "take-away" we'd like to see readers come away with is that performance of their application is something that should be considered throughout the entire software development process. In particular, the expected performance of the application is something that should be well understood as early as possible. If there are concerns about whether that performance can be met early in the software development process, then you have the luxury to mitigate that risk by conducting some experiments to identify whether those risks are "real" along with whether you may need to make some alternative decisions, which may require a major shift in the choice of software architecture, design or implementation. The key here is the ability to "catch" the performance issue as early as possible regardless of the software development methodology.
InfoQ: What would you say are the key things that need to be in place before you start performance tuning?
The first and foremost thing to have in place is to understand exactly what it is you are trying to accomplish. Without having a clear understanding of that, you may still learn some things, but you are at risk of not accomplishing what you really wanted to accomplish.
Having a clear understanding of what you want to accomplish will help identify what you need in the way of hardware, software and other environmental needs. In short, in our opinion, the more explicit you are about what you want to accomplish, identifying what you need to reach that goal/objective becomes more clear.
Within a performance testing environment, a key thing to have in place is an environment that has the ability to produce consistent results with as little variability as possible between results / runs. The larger the variability, the more difficult it is to identify whether you're really observing improvements (or regressions) in your performance tuning efforts or it's just random noise (variability) in running the experiments. How much variability is acceptable depends on how much of an improvement you are looking for.
By the way, there are times when it's useful to understand why a given environment, setup, etc introduces wide variability between test runs, especially when you're looking for small percentage improvements in performance, or you're wanting to identify small percentage improvements in performance. If you spend some time investigating statistical methods and their equations, you can understand the reasoning behind the "variability" discussion here. This topic is also touched upon in the book sections from Chapter 8, "Design of Experiments and Use of Statistical Methods".
Another important thing if your testing environment deviates from production, is that you understand the differences, and more importantly, whether those differences will impact or impede what you are trying to accomplish in your performance tuning.
InfoQ: Do you think it is possible to run meaningful performance tests if your test hardware isn't an exact replica of your target or production environment?
It depends on what you want to learn from the performance test and it also depends on how the environment deviates from the production environment. If you have a good understanding of what you want to learn from your performance tests and you don't have the luxury of an exact replica of a production environment, you need to know how your testing environment deviates from the production environment. If you can justify that the environment deviations will not impact what you want to learn, then you can use an environment that's not an exact replica.
Ideally, to ensure the highest probability of success with your performance goals, it is important to ensure that the test machines use the same CPU architecture as the production machines. For example, the UltraSPARC T-series CPU architecture is very different from an Intel Xeon CPU or an AMD CPU. Any data generated on Intel Xeon test systems, for example, cannot be easily translated to an UltraSPARC T-series production machine. Also, it is important to account for differences in architecture between the different CPU families from the same manufacturer, eg: Intel Xeon vs. Intel Itanium. More on chip differences below.
However, keep in mind that you need to be able to convince yourself, and your stakeholders, that the differences between your testing environment and the production environment do not introduce performance differences. That can be a difficult task. It would be wise to document any assumptions you are making and be able to show that the assumptions you are making don't introduce differences. Again, that can be a difficult task.
There are two points we should also make here. One, what we're trying to describe here is the notion of "designing an experiment" around the questions you want to have answered, or what you want to learn. Then, identifying a test environment that can satisfy the "design of experiment" without introducing bias or variability that puts into jeopardy what you want to learn. The second point is, (back to the chip differences), one of the reasons for the latter part of Chapter One's sections on "Choosing the Right Platform and Evaluating a System", "Choosing the Right CPU Architecture" and "Evaluating a System's Performance" is so readers understand that, often times, the commonly used traditional approach of evaluating the performance of a system where you take a subset of a production system, put it on a new system, run it at much less than its full capacity, is a practice that has flaws when evaluating a system using UltraSPARC T-series CPUs. The reason this assumption and approach has flaws is its difference in CPU architecture. The motivation for including these sections is so that folks who are evaluating systems understand that this traditional approach has flaws and to have readers understand why it is flawed. In addition, our hope is that readers will also question whether any other differences in their testing environment versus production may introduce some unexpected or unforeseen flaw(s).
Another topic that is applicable is scalability. Testing on different hardware will most often not show scalability issues if the test hardware has fewer virtual processors than the production hardware. This can be illustrated with some of the example programs used in Chapter 6, "Java Application Profiling Tips and Tricks". If you happen to run some of those example programs on hardware with a small number of virtual processors, it is likely you may not observe the same performance issues as you may see on a test system that has a large number of virtual processors. This is also pointed out in Appendix B where the example source code listings for those programs exist. So, if what you are wanting to learn from performance testing is related to how well a Java application scales, having a testing environment that replicates (as close as possible) the production environment is important.
InfoQ: A common approach to performance tuning (it's advocated for example in "Java Performance Tuning" by Jack Shirazi) is to set-up a performance test, run with a profiler to identify, say, the 2 or 3 worst performing functions, address those problems and repeat until the performance criteria you set have been met. Reading your book, you seem to advocate a somewhat different approach, that is more use case centric. So you suggest taking a look at what use case is being executed that includes this particular method, and consider if there are alternative approaches or algorithms that could be used to implement that particular use case, that might perform better. Is that a fair summary, and if so why do you favour this approach?
It's a very quick summary, but fairly accurate. To be a little more specific, we advocate first identifying whether you need to profile the application. It's through monitoring the application via JVM monitoring tools, looking at GC logs, looking at application logs, and capturing operating system statistics that you will observe symptoms or clues as to the next step that will point in the direction of finding a resolution to your performance issue.
It's worth mentioning that one thing we have noticed with folks who are investigating a performance issue, is they tend to gravitate towards what they know best. For example, a Java developer tends to go immediately look at the code, some will profile the code immediately, systems administrators will look at operating system data, attempt to tune the operating system, or communicate to the Java developers that his or her application is behaving badly and proceed to tell him or her what is being observed at the operating system, and a person with JVM knowledge will tend to want to tune the JVM first. I think that's kind of understandable since we all have our "comfort zones". However, we should let the data that's collected drive the performance tuning effort. It's the data that will lead you to the performance resolution the quickest.
What we advocate is using monitoring tools at the operating system, JVM and application level. Then analyze that data to formulate a hypothesis as to the next step. Sometimes it may be application profiling, sometimes it may be JVM tuning and sometimes it may be tuning an operating system, (our experience is that it's rarely operating system tuning).
If we assume that we have sufficient evidence to suggest the next step is to do application profiling, then we advocate the idea of first stepping back at the call path level, which often times maps to a use case, and asking yourself what is it that the application is really trying to do. Most modern profilers offer a "hottest call path" view. You will almost always be able to realize greater improvement by changing to, for example, a better algorithm in the "hottest call path" than you will by improving the performance of the "hottest method".
If, however, you only need a small improvement to meet your performance goals, then looking at the hottest method and improving it will likely offer a quicker means to your end goal than the "hottest call path" approach.
So the reason we advocate the "hottest call path" approach is we are assuming you're looking for peak performance. We think most folks would agree that stepping back and looking at alternative algorithms or data structures offers the potential for a bigger performance improvement than making changes to the implementation of a method or several methods.
InfoQ: You also advocate thinking about performance during the requirements gathering phase. Again this is much earlier than is common in my experience. Why do you think it should be done there?
Several reasons ... it offers you a chance to identify potential risk areas which you can start mitigation efforts on immediately, and the sooner you identify potential performance issues, the less likely the cost of having to deal with those issues later in the software development cycle. It follows from the well understood idiom of, "The earlier a bug is found in the software development lifecycle, the less costly it is to fix it". And, we consider a performance issue a bug. ;-)
Additionally, thinking about performance and talking about performance early on at requirements gathering time helps set expectations from both the folks who are building the application and the folks who will be using it. It can also potentially be used or incorporated as part of an acceptance test plan with the users of the application.
InfoQ: You suggest integrating performance tuning into a continuous integration cycle, in addition to the unit and other functional testing that is typically automated. Given that, would you advocate hiring performance specialists, or does your recommendation that performance tuning be addressed early and as part of the build cycle, push the task towards developers?
First, we wouldn't suggest to hire performance specialists unless you have a need for the expertise. The reason for recommending performance testing to also be included as part of unit and other functional testing is to catch performance issues as soon as they are introduced. The earlier you can catch performance issues, the less time and money you'll spend finding and fixing them.
The need for hiring performance specialists should come out of not being able to find a performance issue, or perhaps with advising on how to go about making performance testing part of unit and functional testing.
Again, the motivation here is to minimize the amount of time and effort in tracking down when a performance issue is introduced. Ideally, you'd like to catch performance issues before they ever get integrated into a project's source code repository. Ideally, you'd like the developer to catch his or her performance issue before his or her changes are committed.
InfoQ: Are there other books on the subject that you would recommend as a companion volume to yours?
Jack Shirazi's book certainly has many useful tips in it. There are also many concepts of Steve Wilson's & Jeff Kesselman's "Java Platform Performance" which remain applicable today. For best practices that go well with performance, check out Josh Bloch's "Effective Java" and Brian Goetz's "Java Concurrency in Practice".
If you're interested in low-level details, Darryl Gove's "Solaris Application Programming" book is one to consider. Although it's Solaris-specific, there are many good general concepts in Darryl's book that carry over to Java performance and performance optimization opportunities that a modern JVM often times just takes care of for you. Many of the types of optimizations Darryl talks about in his book are types of optimizations automatically done by the Java HotSpot VM's JIT compiler. Another book that also fits into the low-level details area is "Solaris Internals" by Jim Mauro and Richard McDougall. Again, although Solaris-specific, generally speaking, if you understand the most important pieces of a modern operating system, those concepts apply to other modern operating systems too. After all, they're usually trying to solve similar problems.
About the Book Authors
Charlie Hunt is the JVM performance lead engineer at Oracle. He is responsible for improving the performance of the HotSpot JVM and Java SE class libraries. He has also been involved in improving the performance of the Oracle GlassFish and Oracle WebLogic Server. A regular JavaOne speaker on Java performance, he also coauthored NetBeans™ IDE Field Guide (Prentice Hall, 2005).
Binu John is a senior performance engineer at Ning, Inc., where he focuses on improving the performance and scalability of the Ning platform to support millions of page views per month. Before that, he spent more than a decade working on Java-related performance issues at Sun Microsystems, where he served on Sun’s Enterprise Java Performance team. John has contributed to developing industry standard benchmarks such as SPECjms2007 and SPECJEnterprise2010; published several performance whitepapers; and contributed to java.net's XMLTest and WSTest benchmark projects.
Srini Penchikala Aug 21, 2014