BT

9 Fallacies of Java Performance

Posted by Ben Evans on Apr 23, 2013 |

Java performance has the reputation of being something of a Dark Art. Partly this is due to the sophistication of the platform, which makes it hard to reason about in many cases. However, there has historically also been a trend for Java performance techniques to consist of a body of folk wisdom rather than applied statistics and empirical reasoning. In this article, I hope to address some of the most egregious of these technical fairytales.

1. Java is slow

Of all the most outdated Java Performance fallacies, this is probably the most glaringly obvious.

Sure, back in the 90s and very early 2000s, Java could be slow at times.

However we have had over 10 years of improvements in virtual machine and JIT technology since thenand Java's overall performance is now screamingly fast.

In six separate web performance benchmarks, Java frameworks took 22 out of the 24 top-four positions.

The JVM's use of profiling to only optimize the commonly-used codepaths, but to optimize those heavily has paid off. JIT-compiled Java code is now as fast as C++ in a large (and growing) number of cases.

Despite this, the perception of Java as a slow platform persists, perhaps due to a negative historical bias from people who had experiences with early versions of the Java platform.

We suggest remaining objective and assessing up-to-date performance results before jumping to conclusions.

2. A single line of Java means anything in isolation

Consider the following short line of code:

MyObject obj = new MyObject();

To a Java developer, it seems obvious that this code must allocate an object and run the appropriate constructor.

From that we might begin to reason about performance boundaries. We know that there is some finite amount of work that must be going on, and so we can attempt to calculate performance impact based on our presumptions.

This is a cognitive bias that can trap us into thinking that we know, a priori, that any work will need to be done at all.

In actuality, both javac and the JIT compiler can optimize away dead code. In the case of the JIT compiler, code can even be optimized away speculatively, based on profiling data. In such cases the line of code won't run at all, and so it will have zero performance impact.

Furthermore, in some JVMs, such as JRockit, the JIT compiler can even decompose object operations so that allocations can be avoided even if the code path is not completely dead.

The moral of the story here is that context is significant when dealing with Java performance, and premature optimization can produce counter-intuitive results. For best results don’t attempt to optimize prematurely. Instead always build your code and use performance tuning techniques to locate and correct your performance hot spots.

3. A microbenchmark means what you think it does

As we saw above, reasoning about a small section of code is less accurate than analyzing overall application performance.

Nonetheless developers love to write microbenchmarks. The visceral pleasure that some people derive from tinkering with some low-level aspect of the platform seems to be endless.

Richard Feynman once said: "The first principle is that you must not fool yourself - and you are the easiest person to fool". Nowhere is this truer than when writing Java microbenchmarks.

Writing good microbenchmarks is profoundly difficult. The Java platform is sophisticated and complex, and many microbenchmarks only succeed in measuring transient effects, or other unintended aspects of the platform.

For example, a naively written microbenchmark will frequently end up measuring the timing subsystem or perhaps garbage collection rather than the effect it was trying to capture.

Only developers and teams that have a real need for should write microbenchmarks. These benchmarks should be published in their entirety (including source code), and should be reproducible and subject to peer review and deep scrutiny.

The Java platform's many optimizations imply that statistics of individual runs matters. A single benchmark must be run many times and the results aggregated to get a really reliable answer.

If you feel you must write microbenchmarks, then a good place to start is by reading the paper "Statistically Rigorous Java Performance Evaluation" by Georges, Buytaert, Eeckhout. Without proper treatment of the statistics, it is very easy to be misled.

There are well-developed tools and communities around them (for example, Google's Caliper) - if you absolutely must write microbenchmarks, then do not do so by yourself - you need the viewpoints and experience of your peers.

4. Algorithmic slowness is the most common cause of performance problems

A very familiar cognitive fallacy among developers (and humans in general) is to assume that the parts of a system that they control are the important ones.

In Java performance, this manifests itself by Java developers believing that algorithmic quality is the dominant cause of performance problems. Developers think about code, so they have a natural bias towards thinking about their algorithms.

In practice, when dealing with a range of real-world performance problems, algorithm design was found to be the fundamental issue less than 10% of the time.

Instead, garbage collection, database access and misconfiguration were all much more likely to cause application slowness than algorithms.

Most applications deal with relatively small amounts of data, so that even major algorithmic inefficiencies don't often lead to severe performance problems. To be sure, we are acknowledging that the algorithms were suboptimal; nonetheless the amount of inefficiency they added was small relative to other, much more dominant performance effects from other parts of the application stack.

So our best advice is to use empirical, production data to uncover the true causes of performance problems. Measure; don't guess!

5. Caching solves everything

"Every problem in Computer Science can be solved by adding another level of indirection"

This programmer's aphorism, attributed to David Wheeler (and thanks to the Internet, to at least two other Computer Scientists), is surprisingly common, especially among web developers.

Often this fallacy arises due to analysis paralysis when faced with an existing, poorly understood architecture.

Rather than deal with an intimidating extant system, a developer will frequently choose to hide from it by sticking a cache in front and hoping for the best. Of course, this approach just complicates the overall architecture and makes the situation worse for the next developer who seeks to understand the status quo of production.

Large, sprawling architectures are written one line, and one subsystem at a time. However, in many cases simpler, refactored architectures are more performant - and they are almost always easier to understand.

So when you are evaluating whether caching is really necessary, plan to collect basic usage statistics (miss rate, hit rate, etc.) to prove that the caching layer is actually adding value.

6. All apps need to be concerned about Stop-The-World

A fact of life of the Java platform is that all application threads must periodically stop to allow Garbage Collection to run. This is sometimes brandished as a serious weakness, even in the absence of any real evidence. 

Empirical studies have shown that human beings cannot normally perceive changes in numeric data (e.g. price movements) occurring more frequently than once every 200ms. 

Consequently for applications that have a human as their primary user, a useful rule of thumb is that Stop-The-World (STW) pause of 200ms or under is usually of no concern. Some applications (e.g. streaming video) need lower GC jitter than this, but many GUI applications will not. 

There are a minority of applications (such as low-latency trading, or mechanical control systems) for which a 200ms pause is unacceptable. Unless your application is in that minority it is unlikely your users will perceive any impact from the garbage collector.

It is also worth mentioning that in any system where there are more application threads than physical cores, the operating system scheduler will have to intervene to time-slice access to the CPUs. Stop-The-World sounds scary, but in practice, every application (whether JVM or not) has to deal with contended access to scarce compute resources.

Without measurement, it isn't clear that the JVM's approach has any meaningful additional impact on application performance.

In summary, determine whether pause times are actually affecting your application by turning on GC logs. Analyze the logs (either by hand, or with scripting or a tool) to determine the pause times. Then decide whether these really pose a problem for your application domain. Most importantly, ask yourself a most poignant question: have any users actually complained?

7. Hand-rolled Object Pooling is appropriate for a wide range of apps

One common response to the feeling that Stop-The-World pauses are somehow bad is for application groups to invent their own memory management techniques within the Java heap. Often this boils down to implementing an object pooling (or even full-blown reference-counting) approach and requiring any code using the domain objects to participate.

This technique is almost always misguided. It often has its roots in the distant past, where object allocation was expensive and mutability was deemed inconsequential. The world is very different now.

Modern hardware is incredibly efficient at allocation; the bandwidth to memory is at least 2 to 3GB on recent desktop or server hardware. This is a big number; outside of specialist use cases it is not that easy to make real applications saturate that much bandwidth.

Object pooling is generally difficult to implement correctly (especially when there are multiple threads at work) and has several negative requirements that render it a poor choice for general use:

  • All developers who touch the code must be aware of pooling and handle it correctly
  • The boundary between "pool-aware" and "non-pool-aware" code must be known and documented
  • All of this additional complexity must be kept up to date, and regularly reviewed
  • If any of this fails, the risk of silent corruption (similar to pointer re-use in C) is reintroduced

In summary, object pooling should only be used when GC pauses are unacceptable, and intelligent attempts at tuning and refactoring have been unable to reduce pauses to an acceptable level.

8. CMS is always a better choice of GC than Parallel Old

By default, the Oracle JDK will use a parallel, stop-the-world collector for collecting the old generation.

An alternative choice is Concurrent-Mark-Sweep (CMS). This allows application threads to continue running throughout most of the GC cycle, but it comes at a price, and with quite a few caveats.

Allowing application threads to run alongside GC threads invariably results in application threads mutating the object graph in a way that would affect the liveness of objects. This has to be cleaned up after the fact, and so CMS actually has two (usually very short) STW phases.

This has several consequences:

  1. All application threads have to be brought to safe points and stopped twice per full collection;
  2. Whilst the collection is running concurrently, application throughput is reduced (usually by 50%);
  3. The overall amount of bookkeeping (and CPU cycles) in which the JVM engages to collect garbage via CMS is considerably higher than for parallel collection.

Depending on the application circumstances these prices may be worth paying or they may not. But there’s no such thing as a free lunch. The CMS collector is a remarkable piece of engineering, but it is not a panacea.

So before concluding that CMS is your correct GC strategy, you should first determine that STW pauses from Parallel Old are unacceptable and can't be tuned. And finally, (and I can’t stress this enough), be sure that all metrics are obtained on a production-equivalent system.

9. Increasing the heap size will solve your memory problem

When an application is in trouble and GC is suspected, many application groups will respond by just increasing the heap size. Under some circumstances, this can produce quick wins and allow time for a more considered fix. However, without a full understanding of the causes of the performance problem, this strategy can actually make matters worse.

Consider a badly coded application that is producing too many domain objects (with a typical lifespan of say two to three seconds). If the allocation rate is high enough, garbage collections could occur so rapidly that the domain objects are promoted into the tenured (old) generation. Once in tenured, the domain objects die almost immediately, but they would not be collected until the next full collection.

If this application has its heap size increased, then all we're really doing is adding space for relatively short-lived domain objects to propagate into and die. This can make the length of Stop-The-World pauses worse for no benefit to the application.

Understanding the dynamics of object allocation and lifetime before changing heap size or tuning other parameters is essential. Acting without measuring can make matters worse. The tenuring distribution information from the garbage collector is especially important here.

Conclusion

When it comes to Java performance-tuning intuition is often misleading. We require empirical data and tools to help us visualize and understand the platform's behavior.

Garbage Collection provides perhaps the best example of this. The GC subsystem has incredible potential for tuning and for producing data to guide tuning, but for production applications it is very hard to make sense of the data produced without resorting to tools.

The default should always be to run any Java process (in development or production) with at least these flags:
-verbose:gc (print the GC logs)
-Xloggc: (for more comprehensive GC logging)
-XX:+PrintGCDetails (for more detailed output)
-XX:+PrintTenuringDistribution (displays the tenuring thresholds assumed by the JVM)

and then to use a tool to analyze the logs - either handwritten scripts and some graph generation, or a visual tool such as the (open-source) GCViewer or jClarity Censum.

About the Author

Ben Evans is the CEO of jClarity, a startup which delivers performance tools to help development & ops teams. He is an organizer for the LJC (London JUG) and a member of the JCP Executive Committee, helping define standards for the Java ecosystem. He is a Java Champion; JavaOne Rockstar; co-author of “The Well-Grounded Java Developer” and a regular public speaker on the Java platform, performance, concurrency, and related topics.

 

 

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Great article by Joel Frederick

Thanks for these. I recently had a time dealing with the last 4 of these issues. I'm still not convinced that JVM tuning isn't a black art.

Re: Great article by Joseph King

Knock, knock.
Who's there?

very long pause.

Java.


Sorry guys, couldn't resist

as fast as C++ by Isaac Gouy

JIT-compiled Java code is now as fast as C++ in a large (and growing) number of cases.


Please share those examples.

Good points all by Mathhew Miller

Any opt task starts with evidence, preferably an end to end profiling. Guessing about performance, even if you're an old saw at Java, rarely results in the most effective use of developer resources.

As an example: we had a very experienced engineer who was convinced that business logic was so expensive it was going to mire the application if we performed it on every object. So we adopted a process where we would collect similar work from the data store, perform the business logic once, and persist the result to every object.

Years later when evaluating the performance "loss" to a conceptually simpler replacement that didn't "clump and stamp", we were a bit surprised to find that performing the logic on every object was about 50% fast. The compiler was optimizing the frequently executed business code so well that the cost of moving data in and out of the VM for evaluation of "similarity" was the real cost of processing.

Had we known how expensive it was to move data in our platform, we might have chosen to do more work at each compute node and less storing and routing of data. Which is what we've been doing ever since. To date every time we've had a "performance event" that somebody's had the temerity to blame on increased business object complexity, it's turned out to be something else. Too small or too frequent trips over the network being the most common case.

Culture clash by Chris Adams

This is a good review of the current situation and how it's changed, and I heartily applaud the “measure, don't guess mentality”. I think, however, that many of the critiques you're defending against are actually right but only from a broader perspective of Java the language versus Java as practiced by enterprise developers. At least #4, 6, 8 and 9 are frequently true in my experience and they're held against Java because system administrators and other technical users observe the problem as endemic to a class of apps which do all run on the JVM but miss that the underlying cause are the baroque (rococo?) coding patterns used by [usually] enterprise software development where performance is an afterthought and architecture is valued for the illusion of complexity rather than simplicity.

When I supported web applications developed by Oracle and similar companies, I thought Java was horrible because every stack trace was a labyrinth and debugging required a mind-meld with code and configuration stored in dozens of widely separated locations. Now that the Java apps I use most are very stable, high-performance projects like Solr or ElasticSearch my impression is far more favorable because I'm not employing things like garbage-collector tweaks as a coping tactic for frameworks or applications which have metastasized beyond the capabilities of their maintainers.

Re: Good points all by Jai Shankar K

This is nothing but Java Hot Code replacement... Before taking any major decisions on performance tuning.. delve more into the theory aspects of the programming language you are working on which in conjunction with the application domain knowledge will give you a better clarity on how things should be vs. how things could be(shooting int he dark)... :)

Caching often indicates deeper problems by Robert Annett

Point (5) "Caching solves everything" is a problem I see a lot for larger/growing systems. I consider the introduction of lots of caches as an architectural smell (sorry for the self-linking but it's better than copying) and indicates a re-evaluation of the structure is probably in order.

www.codingthearchitecture.com/2011/10/02/is_cac...

Re: as fast as C++ by Ben Evans

Isaac: Start by googling "Java intrinsics" & "monomorphic dispatch". Also, gaining an understanding of how Java's inlining works will really help - but there aren't very many good detailed descriptions of it on the web, unfortunately.

Re: as fast as C++ by Isaac Gouy

@Ben Evans
Start by googling "Java intrinsics" & "monomorphic dispatch"...

I was hoping you would point to some Java applications that were as fast as corresponding C++ applications. If you can't do that then please share some examples where small programs are shown to be as fast as C++.

Please don't say "as fast as C++" and then tell others they have to show that's true.

While what is said is generally true... by John Goode

To have web apps, you still need that resource hog called an app server. These babies are slow compared to apache or IIS.

Re: While what is said is generally true... by Victor Grazi

Is that really so? We did some benchmarking last year on WebLogic and there was virtually no significant performance or memory impact running an application vs an equivalent stand alone application. Are there any published benchmarks?

Re: as fast as C++ by Ben Evans

If you're genuinely interested in performance then you know that apples-to-apples comparison of non-trivial apps is pretty much impossible.

I provided you with two important optimizations that the JIT compiler does - one of which (intrinsics) is very difficult for a C++ compiler to perform, and the other (monomorphic dispatch) requires extensive runtime support - which to the best of my knowledge (& if a reader knows otherwise, please link me to a source) no C++ runtime provides.

These are not isolated or trivial optimizations. Some estimates of monomorphic dispatch are as high as 85% of all method dispatches.

I'm afraid your comments so far come across more as someone with a language axe to grind rather than being interested in the topic.

Re: as fast as C++ by Fernando Rubbo

Actually, I agree with Ben when he said "apples-to-apples comparison of non-trivial apps is pretty difficult", but there is an interesting benchmark comparing a very old version of Java with the same code written in C. And I think Java was pretty well :-)

bytonic.de/html/benchmarks.html

Other incorrect benchmark, but it is interesting to discuss is the Visual Studio against IDEA, Eclipse, NetBeans. As far as I know, Visual Studio is written in C++ and the others in Java. And All of them have similar performance.

Besides this, I've listen to an interview with AMD Java Labs where they make "jokes" about C++ being faster than Java. I've tried to find the link. I think this is the one (haven't listening to it again, so there is a possibility this is not the correct link): javaposse.com/java_posse_243_interview_with_amd...


My two cents in this discussion:

Java have a slower startup time and use to MUCH memory whenever comparing it to C/C++. Besides this, Java still have a GC (which is very good for enterprise development but takes time to run) and developer in C/C++ world are used to care a lot of memory usage and performance (which is not so common in Java Enterprise world, unfortunately).

On the other hand, the JIT compiler is very impressive whenever we talk about performance (there are some compile customization which can not be done statically) and at each update Java is receiving new processors specific instructions, so if you test your code with an older JVM it will Be slower than new ones (in C/C++ you have to recompile the code to take advantage of these newer instructions). The JVM parameter -server, wich is used in enterprise, also helps to improve Java performance.

A lot of this is true, but not specific to Java by Angelo Hulshout

Good article, as far as the 9 issues are concerned. However, 3-6, and to some extend also 2 are not specific to Java, which make the title of the article somewhat void.

The benchmark itself is a fallacy by Li Chen

The benchmark compares Java to mostly scripting languages. Of course Java is faster.

Are 8 and 9 redundant? by Osvaldo Andrade

I think 8 and 9 are redundant. Because, according to Appel, the cost of Mark and Sweep is (c1 R + c2 H) / (H−R) where R is the size of words used in the heap, and H the heap size. So, it's easy to check this function grows when R approached H. In other words the cost of GC is always proportional to the size of the heap.

Anyone disagree?

Re: as fast as C++ by Osvaldo Andrade

I did not agree with the term "optimization", because there is no optimal solution in this kind of problem (undecidable) and for me the best term would be something like "improving"... :-D

By the way, its true JIT can build java code running faster than statically compile C / C++ (even with -O3 in GCC). The main issue in static optimization is that it needs predict the future but in runtime the JIT knows the hotspots.

Ok, i know it runs faster, but would java been applied for any kind of application? For exemple, what happen if i need to solve a very large NP-Hard problem like SAT, Knapsack 0-1 or photo mosaic generation? supposing this is linear then we cannot distribute the processing among other VM's and also we have 64GB available to compute, would be possible to set -Xmx64GB? Of course no, unless you make some workaround using stack as a heap.

My conclusion is java can run faster than statically compiled apps, but its limited to very low memory complexity. And sometimes i think load balance among processes in the same machine sounds like a real big hammer, but unfortunately this is the only way to scale using Java.
:-)

Re: Great article by Dan Sutton

LOL! -- You can also optimize Java implementations by rewriting them in C#.

Re: as fast as C++ by kamran usman

Just to correct you, Visual Studio's recent iterations are developed in .net.. i am talking about the main IDE / UI.. not the compilers which are a separate process.. earlier versions of Visual Studio that were pure MFC / C/ C++ were blazingly fast...

Re: Are 8 and 9 redundant? by Ben Evans

I'm not sure I fully understand your point here - it's somewhat brief.

However, I'd point out that 8 & 9 are not purely about computational cost. In terms of pure number of CPU cycles consumed for GC, Parallel Old is always going to be cheaper than CMS - as CMS has a large amount of additional book-keeping which needs to be done.

8 & 9 are really about throughput vs pause time on real-world, concurrent systems, rather than purely algorithmic concerns.

resource wasting by Carlos Quijano

What I have seen always related to Java is that for solving a weight load of work Java consumes 10 times more resources than C or simple scripting languages (say perl + memcached). Perhaps this is wrong. But then this raises an interesting question: Is Java worst in this cases because of the language's overwhelming architecture or because of the systems and software architects being unable to understand how to fit it into a particular limited amount of resources? I bet the second is the correct answer. When developing with Java, it is very easy to get entangled and produce very redundant / complicated code. Then every new step / functionality being 10 times slower than it should be; Eventually one finishes wasting resources x10. Once you are there, changing the code to be lighter and eficient is too difficult and you can only grow using iron. And to me, you may find here a marketing relation between this scenario and the need of Sun (now Oracle) on selling big mainframes or clusters for practically everything. Java is not bad, but if you are into Java, you should take care of this things. This is not a Fallacy...
What I really mean is that performance tests are not as important as stress tests. This tests are the ones in which big Java applications fail...

Re: resource wasting by Justin Hancock

Carlos you need to back up your claims. Have a look at the LMAX disruptor, 85 million TPS on a single small machine.

Hadoop, HBase, Cassandra are all examples of very large high performance applications written in Java.

You're effectively proving the authors assertions, that there is a fallacy about java performance perpetuated by a vested clique.

Bad programmers will always be bad programmers, doesn't matter what they code in, I've seen dog slow C++, even worse Visual Basic and plenty of crap java, its the programmers that suck not the languages.

Re: resource wasting by Carlos Quijano

Totally agree with you Justin. Hadoop is the perfect example of how Systems Architecture turns weird concepts like Software Architecture into an arrogant falacy. Hadoop System is achitectured in such a way that the way an aplication is developed changes drastically (for the better). Software Architecture for me is a weird concept and Java is full of it. I think the day of JVM being similar to the Insfrastructure as a Service virtualization (Why not?) Java will be a true overpowered language. Before that, we only have well architectured things like hadoop coded in Java for convenience, thats ok, but painfully...

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

23 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT