BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Plumbr Introduces New Java Performance Monitoring Tool

Plumbr Introduces New Java Performance Monitoring Tool

Plumbr has launched a new Application Performance Management product in their performance monitoring toolkit, that automatically reports on which technical errors are the root cause for performance incidents that impact the end user.

Plumbr is a Java performance monitoring solution that detects the root cause of performance issues and sends an alert to the end user with a link to the issue and its root cause. Plumbr provides end user experience monitoring via instrumenting incoming transactions at JVM boundaries. For example, transactions from an incoming HTTP traffic are monitored for duration and success. If a particular transaction doesn’t meet the response time criterion or returns a 500 series response code, it is flagged accordingly (‘Slow’ for the former case or ‘Failed’ for the latter). Plumbr also provides information on potential sources of performance bottlenecks (e.g. lock contention or JVM memory related issues) by employing the JVMTI (Java Virtual Machine Tool Interface) programming interface, introduced in Java Platform Standard Edition 5.0, and described in JSR 163. Plumbr uses JVMTI as well as some other interfaces for detecting potential problem areas.

Earlier this year, InfoQ covered Plumbr and the various root cause detection features that were added to its repertoire. The article worked through an example showcasing how Plumbr drilled down to the root cause of an expensive JDBC operation.

A typical APM (Application Performance Management) solution monitors the user experience and exposes the services that are not performing as expected. There are also different troubleshooting tools (profilers for example) that are capable of exposing information from within the JVM for gathering evidence that can be used to troubleshoot a particular performance incident. What makes Plumbr stand out is the ability to bind APM with root cause detection. The key here is the possibility to reproduce the performance issue in the environment where profiling takes place and being able to interpret the evidence exposed by the profiler. For example, if a potential performance bottleneck is found to be in the context of a transaction, the root cause will be linked to that incoming transaction, thus enabling the tracking of the total (potential) impact of a particular root cause.

InfoQ asked Plumbr’s co-founder and head of product, Ivo Mägi to explain the ‘Plumbr difference’ and Ivo replied with an example:

The best way to understand it is via following question-answer dialogue played out. Answers in the dialogue are all provided by Plumbr:

  • What is slow? /invoice/pay/ service deployed on the JVMs artemis.public & zeus.public
  • How slow is it? During the past hour, 1,000 invoice payment transactions have been detected. Out of the 1,000 transactions, 250 were completing under the 5,000ms threshold set for the particular service. The latency distribution for the service over this period shows that 10% of transactions took more than 10,000ms to complete, 1% took more than 15,000ms and on the worst-case the transaction completed only after 33,000ms
  • What is causing this? Checking the list of root causes affecting the transaction, it is immediately visible that 240 of the slow transactions were caused by a single SQL query taking most of the time during the transaction. The query itself is exposed, along with parameters and exact call stack through which the query was executed.

Ivo concluded with the following statement:

So, using Plumbr, your entire operations and development team can share the same information to triage and fix the detected issues. The lengthy troubleshooting process is removed from the picture and the mean time to resolution (MTTR) of performance incidents is reduced by more than 10x.

InfoQ also discussed Plumbr overhead with Ivo, who directed us to an FAQ that discusses the overhead in detail. The takeaways from the FAQ were as follows:

There are two types of memory overhead due to internal data structures - overhead to the total Java heap usage, which is usually less than 2% for a 4GB heap and could be up to 8% for smaller heaps; and overhead due to native memory usage, which rarely goes to 400MB but is typically less than 25MB. The CPU usage is more application dependent, and applications that create a lot of short-lived objects can feel more of the CPU overhead since Plumbr tracks all object creation and destruction events.

Plumbr launched on October 8, 2015 and offers a 14-day trial program. For further reading please refer to the Plumbr blog.

Rate this Article

Adoption
Style

BT