Virtual Panel: Performance Tuning Face-Off
In the world of application delivery, performance tuning still seems to elude the mainstream. InfoQ spoke to five luminaries of the performance monitoring space about why and what can be done. The result was quite an active debate.
Members of the virtual panel:
- Ben Evans is the CEO of jClarity, a startup which delivers performance tools to help development and ops teams. He is also an Oracle Java Champion and best selling author.
- Charlie Hunt is the Architect of Performance Engineering at Salesforce.com and the lead author of the best selling book "Java Performance".
- Kirk Pepperdine is a world-famous performance tuning consultant, trainer, Oracle Java Champion, and contributing author of the book "97 things every programmer should know".
- Martin Thompson is a high-performance and low-latency specialist, with over two decades experience working on large-scale transactional and big-data systems.
- Monica Beckwith is the Oracle performance lead for the Garbage First Garbage Collector.
InfoQ: Most organizations tend to overlook performance tuning and testing. Perhaps you can share your experiences on how to overcome these difficulties. What practices, tools, and resources should a firm implement to bring performance tuning into the mainstream?
Martin: "Tend to overlook" now that is an understatement! Most just seem to rush product out and don't think at all about quality, or any non-functional requirements such as performance, availability, security, etc.
InfoQ: I agree; most telling is that I have been interviewing candidates for some fairly senior Java work recently and it is incredible to me how many of these experienced developers have never used a profiler and know nothing about performance tuning or GC. Is this a consequence of the inherent difficulty? Or is performance just treated as an afterthought? What can firms do to correct this?
Martin: Now the real surprise for me going back into consulting has been how little people know about performance testing and tuning even when performance *is* a key requirement for their domain.
Ben: I think that some of the problems that you are mentioning are connected, and have to do with how we develop our performance engineers. I saw this quote recently, which I think, is relevant:
"Training teaches a person how to carry out a specific task more efficiently and reliably. Education, on the other hand, opens and enriches a person’s mind."
When dealing with performance issues, both aspects are required. We need to impress on students the importance of empirical data, statistics and repeatable runs in performance analysis and tuning (this is a training aspect). However, we also need them to understand how to apply their own experience across the whole application stack to the performance problem (which is much more of an education aspect).
Not only that, but we need to lead them away from the "tips and tricks" approach to performance (which is easier to teach, as it is essentially a training technique rather than education).
Lastly, when dealing with education or training, the student must make regular use of the material and new skills; otherwise they will simply atrophy again. Going on a performance course that isn't used as part of core duties for 12 months is essentially useless.
Kirk: A while back I saw a performance talk given by Eric Smart. In that talk he pointed out all of the classic mistakes that they made. First, the focus was on features, features, and more features. They were a bunch of smart guys and performance would sort itself out when it needed to be sorted out. Trouble was, when performance finally needed to be sorted out it turned into an exercise where every day they were almost there, just a little bit longer and we'll be done, one more day, one more week, then a month, two, three, and then four. At that point they put the breaks on and asked “ok, what's going on here?” They realized that by not making performance an everyday part of their routine they had ignored problems that couldn't be seen in the small. It was precisely those problems that they were fighting, to the point where it almost failed the project and by extension the company.
That story had a happy ending because they were able to turn a slow-burn failure into a success. But it took an epiphany to recognize that the traditional development path wasn't serving them all that well.
Monica: There is also this other aspect where a company has someone with the title of "performance engineer" that pretty much collects data (raw, logs or profiles) for comparison but is unable to spend quality time (due to either lack of knowledge or the lack of organization level commitment) analyzing and "deep diving" into the data to discern any surreptitious performance issues. The other end of the spectrum is a performance engineer that is not directly invested in the product because he or she is working on a disconnected group, and always suggests major changes that may become too expensive for the organization to invest resources. I think the problem in such cases can be chalked up to a lack of understanding or clear-cut definition of a well-rounded performance plan.
Having said that, I have observed throughout my career the organizational level commitment, but I think quite a few organizations have to work on the problem of disconnected groups, and a few "performance engineers" have to yet work on building their knowledge. That said, I have also worked with many developers that are performance savvy. And that's like hitting the jackpot for an organization.
Performance tuning is an art. Some practitioners might be more gifted than others, but whoever is tasked with the role must practice and perfect this art. And then it’s incumbent on us performance engineers to impart our knowledge onto the others. This can easily be accomplished through open forums and conferences, and by consulting directly with teams.
Kirk: I guess this all depends on what you mean by tuning. Trying to sort out why your application isn't performing up to expectations is a diagnostic problem. Knowing what to do about it once you've diagnosed the condition may fall between two poles:
- We've seen this before and the solution is well understood
- We need to come up with a novel way to get more out of the hardware.
So the solution not only depends on diagnostic abilities, it also depends on what needs to be done to improve performance. But since in my experience most performance problems once detected are easily solvable I think the biggest difficulties are seen in diagnostics.
When I first heard Martin ask the question “how many people have been taught how to use a profiler” at Devoxx London my first thought was “what a simple but brilliant question!” It's one that I'd never thought to ask because I knew the answer. But still it was one that needed to be asked to point out the obvious. Here is what should be an important tool in our diagnostic arsenal and yet I don’t know of a single school or other institution that covers that topic. I have been contacted by professors from a couple of Universities interested in exploring how the subject of tuning might be taught but it's never really gone beyond that.
Getting back to this idea that tuning is difficult I think one of the reasons is that it's a diagnostic activity and as such is fundamentally different than development. By its nature it's investigative and by its nature it requires that one obtain measurements. Not just any measurements but the right measurement, the one that exposes the problem. And therein lies the problem: How is one supposed to get the right measurement if all one knows how to use is an execution profiler running in an IDE. Yet without the right measurement you're left to your intuition and other forms a guessing. The problem with intuition is that one can't possibly imagine all of the interactions that happen when software finally meets hardware and real users. This realization led to the mantra “Measure, Don't Guess™” that Jack Shirazi and I coined years ago. It's also led to my Performance Diagnostic Model™ we teach in my course. PDM has helped me help developers repackage what they already know into something that can ease the difficulties in diagnosing performance problems.
Yet even though PDM has helped, it's only slightly improved the rate of success with my example exercise. This is because diagnostics require yet another skill and that is the ability to lift your thinking out of the details to understand overall purpose. Diagnostics, like any form of troubleshooting, require that you dig into the details and that sometimes makes it difficult to refocus in order to gain a broader understanding. How one fixes this is beyond my pay grade. But it's a problem that certainly has an effect on one's ability to follow a diagnostic process.
Martin: We have observed that "most don't measure or use a profiler". But I've observed an even more interesting phenomenon. For the teams that do profile and do collect system metrics, when they look at the data it seems to mean nothing to them. I often go into clients and they tell me they have a performance issue. I ask if they have profiled the application in various ways. In my space they often do have profiler licenses, GC logging turned on, System Activity Report, etc. I'm then amazed that when they show me the data. The answer just jumps out at me and they don't seem to be able to see it. What it took me awhile to realize is this is not because they are not smart. They just don't know how to interpret the data because they have not honed this skill. It’s similar to what we find with using a debugger; when your working practices have you doing this every day it just becomes second nature. Barring that it is almost an impediment.
Charlie: I agree with Monica that the definition of “performance engineer” seems to vary depending on both the company and whom you talk to there. At some companies a performance engineer is someone who does performance testing, a quality type function where he or she runs performance workloads against the product under development and reports the results of the workload execution; improvement, regression, or inconclusive. In other contexts performance engineers take the next step and do some analysis to identify the source of an improvement or regression. And, there's yet another type of performance engineer that goes beyond the analysis and looks for optimization opportunities. So we observe quite a wide spectrum of roles, responsibilities and skills required. I suspect everyone on this panel falls into the latter category, and I also believe that anyone who is developing an application should have interest in the performance of what he or she is developing, and not merely look at "performance" (or quality) as someone else's job. I once heard a development engineer say, "It's not my job to test my code. That's why we have a quality and performance engineering teams." In my humble opinion, that engineer should have been fired on the spot!
Martin: The separation of performance engineers from "regular engineers" is a huge anti-pattern. It fails for many reasons. I totally agree that quality of functional and non-functional requirements is the responsibility of all engineers/programmers. But consider this: In an interview I once asked, "What is the difference between a HashMap and a TreeMap?" I would have been happy if he told me that a HashMap was O(n) whereas a TreeMap was O(log n). I would have been thrilled if he even explained the implementation. But the guy’s response hit quite a sour chord when he said "Programmers don't need to know that sort of thing anymore!" Yes, it is great to have specialists but they need to coach the rest of the team to bring up the standard.
Charlie: I think the difficulty extends beyond tuning, to analysis and to identifying optimization opportunities. Things become difficult when you begin to leave the "science" of performance engineering to the "art" of performance engineering. I'd describe the discipline of statistics in a similar manner. There's the "science" of statistics and statistical methods, and there's also the "art" of applying appropriate statistical methods. You can teach folks the science of statistics, formulas for all sorts of different types of calculations, but teaching folks the art of using appropriate statistical methods to use in a given situation is a totally different thing. In a similar way we can teach folks the science of performance engineering; but the art of performance engineering is a totally different thing. I'm of the opinion that those who really struggle with performance engineering struggle with the art of performance engineering, and unfortunately that's not something easily taught or learned.
Kirk: Depending on your definitions I might disagree. When I think of art I think of something that isn't measurable or can't easily be quantified. When I think of performance I'm thinking of something that is measurable and can be quantified. In my world if you say "art" what you really mean is “I'm guessing”. Some people may be better at guessing than others but in my experiences this isn't because they are able to divine an answer from thin air; rather it's that they have a well-developed mental model in which they can plug in the available data and arrive at a higher quality answer. I see this in my course where I offer a problem that has minimal amounts of data and I watch developers struggle. Yet after they've been given a mental model to work with all of a sudden this minimal amount of data just screams out the answer to the problem. The clues are all around us; we just need to be able to recognize them. I don't believe this is art; it's about understanding what the measurements represent. “Art” is coming up with new and novel ways to get more out of the hardware that we currently have to work with.
Charlie: When I said art I did not mean guessing. One of the forms of arts in performance engineering is pattern matching. What you're describing in what you do in your course is attempting to offer the science of performance engineering and then teaching the art of performance engineering. The act of knowing what to measure / monitor, knowing what the threshold of a metric is, is something that warrants further investigation. Knowing what tools to use and assimilating that into a root cause is what I refer to as an art. It's knowing how to apply the right performance engineering methodologies in the right situation. If it was science it could be modeled mathematically. The "mental model" as you describe it, is what I call the "art". The aspect of knowing what to guess or speculate, narrowing down the possibilities of what to guess, or making a well informed educated guess, or knowing what additional data you need to make an informed decision falls into what I'd call the art of performance engineering.
Kirk: Yes, I find another bad habit is that people often draw conclusions that aren’t necessarily derived from the data. They’re hypothesizing or speculating or in my books just guessing! I can give you an example. An upward sloping curve in a heap graph, after a garbage collection implies a memory leak, right? I say that is an over reaching conclusion based on speculation. I can send you GC logs where you'd be wrong. How do I know the difference, it's not art, I simply ask about intent. Is all of this algorithmic? Yes I believe it is.
Charlie: Perhaps a question to consider is: why do most universities teach software engineering students a programming language before teaching them the mathematics and theory of computing? Contrast that with other engineering disciplines such as mechanical engineering or industrial engineering where students are taught the mathematics and theory of their discipline first.
Kirk: I don't want to downplay the importance of statistics but quite often the things that happen in a computer system are better described by queuing theory. From what I can tell, very few software developers or testers know much about queuing theory. I don't think you need a deep understanding of the subject but it is necessary to have some understanding and know how to apply it.
Charlie: I've always considered queuing theory as part of the study / discipline of statistics. My first exposure to queuing theory was in a statistics course. Remember that much of what you're modeling with queuing theory is based on probability density functions, i.e. Poisson, exponential, and other distributions.
Martin: While largely I agree with Charlie and his 3-legged stool of performance tradeoffs, I find it is damaging in some cases. When thinking about runtimes and GC, yes, it applies well. But when thinking about data structures, a small footprint often means low-latency and high-throughput. To provide some context I've worked on systems that ingest the entire North American financial markets feeds, that can sustain periods at over 11 million messages per second. Given the architecture of modern x86 servers we only have TLB cache support for large pages (2MB) on the L1 cache. L2 cache only has a TLB cache for 4K pages. Intel’s soon-to-be-released Haswell chip will support 2MB pages on the L2 cache when it comes out for servers next year. When the rest of the code is efficient this can become a major issue. We don't notice this scanning GC card tables because of the pre-fetcher but for most other data structures it is a major issue. I guess I'm saying there are good general rules for the mainstream, but at the higher end other things can become much more important.
InfoQ: What tools and techniques should organizations invest in to bring performance analysis into the mainstream?
Monica: First, as I mentioned, is performance planning. Second is building the performance infrastructure. Performance testing should be a part of the product lifecycle. Hence, investing in a robust performance infrastructure is a must. Many organizations trap themselves by not being ahead of the game and before they know it, they need to invest in more resources to "patch things up". This behavior is unhealthy, non-productive and not resource efficient for either the organization or its employees.
Charlie: I've been giving this quite a bit of thought recently. Several of you have probably heard me talk about performance in terms of throughput, latency and footprint. When it comes to talking about meeting performance goals for an application, there are eight questions I've started asking product teams and sponsoring execs:
- What's the expected throughput?
- What's the lowest throughput you can live with, (nothing lower than this value), and how frequent can the throughput drop to that level, and for how long?
- What's the throughput metric, and how will it be measured?
- What's the expected latency, and how often can we drift above that value, and for how long?
- What's the latency that you can never go above?
- What's the latency metric, and how will it be measured?
- What's the maximum amount of memory that can be used?
- How is memory usage measured?
I've noticed some companies and organizations fail to appropriately capture the appropriate metrics. Here's an example that I saw recently, (not at Salesforce, by the way). Suppose you're interested in lowering your memory footprint. But the performance criteria is not only a reduction in memory footprint, it also requires no increase in CPU utilization, no reduction in throughput and no reduction in response times. If that's the case, then what's really being said is you're trying to realize a reduction in memory footprint, but not willing to sacrifice anything in throughput or latency. Good luck with that one! Emphasizing an improvement in one of these attributes, memory footprint in this example while keeping throughput and latency constant is going to require an enormous / non-trivial amount of development effort. Rarely can you realize an improvement in one of these performance attributes without sacrificing something in one or both of the other attributes. I'm quite astonished how few folks understand this relationship.
A similar question is how is the application going to be tested? How do you capture the expected and projected production use? I think you have to understand and characterize the worst-case performance behavior of the application, especially if what you are building is a platform for others to build applications on top of. In addition, you also need to capture the performance of expected and projected production usage of the application. You also need to close the loop between what has been developed as a workload, and how the application performs in production. In other words, how effective is the workload at predicting production performance in terms of those three performance attributes? I've rarely observed folks who close this loop and measure the effectiveness of their workloads.
Kirk: Great list... I'd add that we are in an industry that suffers from "CPU envy" and quite often CPU just isn't the problem. So, understanding the fundamental limits of the hardware we're using and relating that to the resources that we actually need is a huge problem that very few have a handle on.
I think we're going to see more and more performance issues far from being CPU related. If we haven't experienced or observed it already, I think we can expect that issues such as memory bus capacity, or more generally the speed at which we can get data to the CPU cache from memory. Memory is the new disk. A big challenge with a large number of hardware threads per CPU socket is having enough memory bus capacity along with sufficient CPU cache. Two years ago I had already observed a multi-core system that saturated the memory bus capacity before it could reach any where near peak CPU usage. How many folks today would know how to measure memory bus capacity, or even think of measuring memory bus capacity?
I can tell you the answer to the last question because I ask that one all the time... nobody!! And yet I've been running into apps that are bound by that resource for a few years now.
Martin: We have reached the point where the industry has swung too far in one direction. We rightly have been pushing for some time to increase developer productivity and delivery predictability but this has come at a price. We now produce software that is so inefficient that I cannot think of any other industry where these levels of inefficiency would be tolerated. So much of our software can be 10-100x greater throughput or lower latency by employing a cleaner design and some tuning driven by profiling. I've seen 1000x improvements in some cases. Our data centre costs are now becoming significant and 2-4% of the CO2 emissions are now computing related. We also must consider the explosion of mobile devices on which efficient software directly translates into power savings, and less servers to provide the mobile services that are mostly server compute based.
About the Panelists
Ben Evans: is the CEO of jClarity, a startup which delivers performance tools to help development & ops teams. He is an organizer for the LJC (London JUG) and a member of the JCP Executive Committee, helping define standards for the Java ecosystem. He is a Java Champion; JavaOne Rockstar; co-author of “The Well-Grounded Java Developer” and a regular public speaker on the Java platform, performance, concurrency, and related topics.
Charlie Hunt is the Architect of Performance Engineering at Salesforce.com and the lead author of the recently released book "Java Performance". Prior to joining Salesforce.com, he was the Java HotSpot VM Performance Architect at Oracle and Sun Microsystems. He is a regular speaker on the subject of Java performance at many world-wide conferences including the JavaOne Conference.
Kirk Pepperdine: works as an independent consultant specializing in the field of Java performance tuning. In addition he has authored a Performance tuning seminar that has been well received world wide. That seminar presents a performance tuning methodology that has been used to improve teams effectiveness in troubleshooting Java performance issues. Named a Java Champion in 2006, Kirk has written about performance for many publications and has spoken about performance at many conferences including Devoxx and JavaONE. He helped found the Java Performance Tuning, a site well known as a resource for performance tuning information. Kirk is also contributing author of "97 things every programmer should know".
Martin Thompson: is a high-performance and low-latency specialist, with experience gained over two decades working on large scale transactional and big-data systems. He believes in Mechanical Sympathy, i.e. applying an understanding of the hardware to the creation of software as being fundamental to delivering elegant high-performance solutions. The Disruptor framework is just one example of what his mechanical sympathy has created. Martin was the co-founder and CTO of LMAX. He blogs here, and can be found giving training courses on performance and concurrency, or hacking code to make systems better.
Monica Beckwith is the Oracle performance lead for Garbage First Garbage Collector. She has worked in the performance and architecture industry for over 10 years. Prior to Oracle and Sun Microsystems, Monica lead the performance effort at Spansion Inc. Monica has worked with many industry standard Java based benchmarks with a constant goal of finding opportunities for improvement in Oracle's HotSpot VM.
Errata : HashMap is O(1) and TreeMap is O(log N)
Brandon Holt, Preston Briggs, Luis Ceze, Mark Oskin May 21, 2015
Kai Kreuzer, Olaf Weinmann May 21, 2015