Defining Performance Analysis
There are many definitions of “performance analysis”, but in my opinion one of the most useful is:
A measurement-driven approach to understanding how an application behaves under load.
The merit of this definition is that it calls attention to measurement as being key to the entire process, and by simple extension, also draws attention to statistics and data analysis as activities likely to be important to the performance engineer.
Going further, it helps to position performance analysis as a fundamentally empirical activity, that resembles an experimental science with inputs and outputs.
These outputs can then be framed as questions with quantitative answers - such as:
At 10x customers, will the system have enough memory to cope?
What is average response time customers see from the application?
What does the rest of that distribution look like?
How does that compare to our competitors?
In this formulation, performance as expressed by these best practices is more science than art; an activity which is fundamentally quantitative and which has a direct relationship to business activities.
However, despite these attributes, performance has often languished in a state where even well-known best practices lag behind the reality of practitioners.
There are a number of different models that might explain this, but one interesting possibility is provided by Carey Flichel in the superb piece "Why Developers Keep Making Bad Technology Choices"
In the post, Carey specifically calls out five main reasons that cause developers to make bad choices:
- Boredom
- Resume (or “CV” if you're British) Padding
- Peer Pressure
- Lack of Understanding of Existing System
- Misunderstood / Non-Existent Problem
In this article, we present some of the most common performance analysis antipatterns in the enterprise platform, and try to express them in terms of their basic causes as enumerated by Carey. The specific examples that led to the distillates below are drawn from the Java ecosystem, but similar remarks apply to many other types of enterprise system.
Each basic cause corresponds to some common cognitive bias. For example, Boredom and Resume Padding both stem from a desire to escape the existing tech that a developer uses in his or her day job, and their aspirational desire for a better tomorrow.
The antipatterns are presented below, in a style and format that should be reminiscent of the Gang of Four, as well, of course, as the antipattern format pioneered by Brown et al.
AntiPattern Catalogue
Distracted By Shiny
Description
Newest or coolest tech is often first tuning target
Example Comment
It's teething trouble - we need to get to the bottom of it
Reality
- This is just a shot in the dark
- Developer does not really understand the new tech
Root causes
- Boredom
- Resume Padding
Discussion
This antipattern is most often seen with younger teams. Eager to prove themselves, or to avoid becoming tied to what they see as 'legacy' systems, they are often advocates for newer, "hotter" technologies - which may, coincidentally, be exactly the sort of technologies which would confer a salary uptick in any new role.
Therefore, the logical subconscious conclusion to any performance issue is to first take a look at the new tech - after all, it's not properly understood, so a fresh pair of eyes would be helpful, right?
Resolutions
- Measure to determine real location of bottleneck
- Ensure adequate logging around new component
Distracted By Simple
Description
The simplest parts of the system are targeted first
Example Comment
Let's get into this by starting with the parts we understand
Reality
- Dev understands how to tune (only?) that part of the system
Root causes
- Lack of Understanding of Existing System
Discussion
The dual of "Distracted by Shiny", this antipattern is often seen in an older, more established team, which may be more used to a maintenance/keep-the-lights-on role. If their application has recently been merged or paired with newer technology, the team may feel intimidated or not want to engage with the new systems.
Under these circumstances, developers may feel more comfortable by only profiling those parts of the system that are familiar, hoping that they will be able to achieve the desired goals without going outside of their comfort zone.
Of particular note is that both of these first two antipatterns are driven by a reaction to the unknown; in "Distracted by Shiny" this manifests as a desire by the developer (or team) to learn more and gain advantage - essentially an offensive play. By contrast, "Distracted by Simple" is a defensive reaction - to play to the familiar rather than engage with a potentially threatening new technology.
Resolutions
- Measure to determine real location of bottleneck
- Ask for help from domain experts if problem is in an unfamiliar component
UAT is My Desktop
Description
UAT environment differs significantly from PROD
Example Comment
A full-size UAT environment would be too expensive
Reality
- Outages caused by differences in environments are almost always more expensive than a few more boxes
Root causes
- Misunderstood / Non-Existent Problem
Discussion
UAT is my Desktop stems from a different kind of cognitive bias than we have previously seen. This bias insists that doing some sort of UAT must be better than doing none at all. Unfortunately, this hopefulness fundamentally misunderstands the complex nature of enterprise environments. For any kind of meaningful extrapolation to be possible, the UAT environment must be production like.
In modern adaptive environments, the runtime subsystems will make best use of the available resources. If these differ radically from the target deployment, they will make different decisions under the differing circumstances - rendering our hopeful extrapolation useless at best.
Resolutions
- Track the cost of outages and opportunity cost related to lost customers
- Invest in a UAT environment that is identical to PROD
- In most cases, the cost of the first, far outweighs the second
PROD-like Data is Hard
Description
Data in UAT looks nothing like PROD
Example Comment
It's too hard to keep PROD and UAT in synch
Reality
- Data in UAT must be PROD-like for accurate results
Root causes
- Lack of Understanding of Existing System
Discussion
This antipattern also falls into the trap of "something must be better than nothing". The idea is that testing against even out-of-date and unrepresentative data is better than not testing.
As before, this is an extremely dangerous line of reasoning. Whilst testing against something (even if it is nothing like PROD data) at scale can reveal flaws and omissions in the system testing, it provides a false sense of security.
When the system goes live, and the usage patterns fail to conform to the expected norms that have been anchored by UAT data, the development and ops teams may well find that they have become complacent due to the warm glow that UAT has provided, and are unprepared for the sheer terror that can quickly follow an at-scale production release.
Resolutions
- Consult data domain experts and invest in a process to migrate PROD data back into UAT
- Over-prepare for at-scale go-lives.
- Wherever possible, have dedicated "worse-case" teams or tools (e.g. Chaos Monkey)
Performance Tips (aka Tuning By Folklore)
Description
Code and parameter changes are being applied blind
Example Comment
I found these great tips on Stack Overflow. This changes everything.
Reality
- Developer does not understand the context or basis of performance tip and true impact is unknown
Root causes
- Lack of Understanding of Existing System
- Peer Pressure
Discussion
A performance tip is a workaround for a known problem - essentially a solution looking for a problem. They have a shelf life and usually date badly - someone will come up with a solution that will render the tip useless (at best) in a later release of the software or platform.
One source of performance advice that is usually particularly bad would be admin manuals. They contain general advice devoid of context - this advice and "recommended configurations" is often insisted on by lawyers, as an additional line of defense if the vendor is sued.
Java performance happens in a specific context, with a large number of contributing factors. If we strip away that context, what is left is almost impossible to reason about, due to the complexity of the execution environment.
Resolutions
- Only apply well-tested and well-understood techniques, which directly affect the most important aspects of a system.
Blame Donkey
Description
Certain components are always identified as the issue
Example Comment
It's always JMS / Hibernate / A_N_OTHER_LIB
Reality
- Insufficient analysis has been done to reach this conclusion
Root causes
- Peer Pressure
- Misunderstood / Non-Existent Problem
Discussion
This antipattern is often displayed by management or the business, as in many cases they do not have a full understanding of the technical stack and so are proceeding by pattern matching or have unacknowledged cognitive biases. However technologists are also far from immune to this antipattern.
Resolutions
- Resist pressure to rush to conclusions
- Perform analysis as normal
- Communicate the results of analysis to all stakeholders (in order to try to encourage a more accurate picture of the causes of problems).
Fiddle With Switches
Description
Team becomes obsessed with JVM switches
Example Comment
If I just change these settings, we’ll get better performance
Reality
- Team does not understand impact of changes
Root causes
- Lack of Understanding of Existing System
- Misunderstood / Non-Existent Problem
Discussion
The JVM has literally hundreds of switches - this provides a highly configurable runtime, but gives rise to a great temptation to make use of all of this configurability. This is usually a mistake - the defaults and self-management capabilities are usually sufficient. Some of the switches also combine with each other in unexpected ways - which makes blind changes even more dangerous.
Resolutions
Before putting any change to switches live:
- Measure in PROD
- Change 1 switch at a time in UAT
- Test change in UAT
- Retest in UAT
- Have someone recheck your reasoning
Microbenchmarking
Description
Tuning effort is focused on some very low-level aspect of the system
Example Comment
If we can just speed up method dispatch time...
Reality
Overall system-level impact of micro-changes is utterly unknown
Root causes
- Lack of Understanding of Existing System
- Misunderstood / Non-Existent Problem
- Resume Padding
- Peer Pressure
Discussion
Performance tuning is a statistical activity, which relies on a highly specific context for execution. This implies that larger systems are usually easier to benchmark than smaller ones - because with larger systems, the law of large numbers works in the engineers favor, helping to correct for effects in the platform that distort individual events.
By contrast, the more we try to focus on a single aspect of the system, the harder we have to work to unweave the separate subsystems (e.g. threading, GC, scheduling, JIT compilation, etc.) of the complex environment that make up the platform (at least in the Java / C# case). This is extremely hard to do, and the handling of the statistics is sensitive, and is not often a skillset that software engineers have acquired along the way. This makes it very easy to produce numbers that do not accurately represent the behavior of the system aspect that the engineer believed they were benchmarking.
This has an unfortunate tendency to combine with the human bias to see patterns, even when none exist. Together, these effects lead us to the spectacle of a performance engineer who has been deeply seduced by bad statistics or a poor control - an engineer arguing passionately for a performance benchmark or effect that their peers are simply not able to replicate.
Resolutions
- Do not microbenchmark unless you know you have a known use case for it; and do so publicly, and in the company of your peers.
- Be prepared to be wrong a lot, and have your thinking challenged repeatedly.
Conclusion
Why has performance tuning acquired these antipatterns? What is it about the tuning process that encourages cognitive biases which lead to such incorrect conclusions?
Key to these questions is understanding that software engineering is fundamentally different from other engineering disciplines. In a wide range of mechanical engineering systems, the physical properties of small components are well understood, and the composition effects lead to only small amounts of (often well-studied) emergent behavior.
Software is different. The systems we build are far more elaborate than those typically found elsewhere in human endeavor. This is both because we work with very simple basic parts, and also because we have built tools which enable us to work with very large numbers of basic parts. Unfortunately, (or fascinatingly, depending on your point of view) as software has grown more complex, we have discovered that it has a highly emergent nature. This means that unexpected phenomena have manifested as our complexity has increased - and as we have discussed in this article, not all of them are positive.
Acknowledgements
Special thanks to Martijn Verburg, Kirk Pepperdine, Trisha Gee and James Gough (and others) for elucidating (and in several cases, naming) these antipatterns for me.
About the Author
Ben Evans is the CEO of jClarity, a Java/JVM performance analysis startup. In his spare time he is one of the leaders of the London Java Community and holds a seat on the Java Community Process Executive Committee. His previous projects include performance testing the Google IPO, financial trading systems, writing award-winning websites for some of the biggest films of the 90s, and others.