An in-depth overview of modern Application Performance Management

Nicholas Whitehead, a Senior Technology Architect with ADP, published a three part article series on IBM's developerWorks entitled Java run-time monitoring. In this series he introduces the reader to Application Performance Management (APM) in three parts:

In part 1 he explores the attributes of APM systems, describes anti-patterns in system monitoring, presents methods for monitoring the performance of JVMs, and offers techniques for efficiently instrumenting application source code
In part 2 he reviews post-compilation instrumentation, specifically through interception, class wrapping, and bytecode instrumentation
In part 3 he concludes by discussing performance and availability monitoring of an application's ecosystem

Whitehead begins his series with a review of APM anti-patterns, which identify key issues that companies may be facing when piecemealing together a monitoring solution. He include the following anti-patterns:

Blind spots: monitoring some, but not all of an environment leads to inconclusive results during analysis
Black boxes: similar to blind spots, but scoped to applications or components. A black box is a component in which the monitoring solution does not have visibility into its internal performance
Disjointed and disconnected monitoring systems: this anti-pattern contrasts siloed monitoring with consolidated monitoring - deep, but disjointed, monitoring of specific application stacks (e.g. an operating system, JVM, or database) can make it difficult to identify the true root cause of a performance problem. Whitehead presents a figure that illustrates this point nicely:

After-the-fact reporting and correlation: attempting to extract data from disparate monitoring tools and correlate their results into something meaningful can be very challenging
Periodic or on-demand monitoring: many monitoring solutions have sufficiently high overhead and therefore are only configured to run after a problem occurs. In this scenario, the monitoring may be too late to identify the root cause of a problem
Non-persistent monitoring: live displays of performance metrics are great, but unless the data can be persisted, it is difficult to establish historical context when reviewing current performance metrics
Reliance on preproduction monitoring: monitoring in preproduction is a good thing, but relying solely on preproduction monitoring is insufficient because user behavior cannot be fully anticipated

After reviewing anti-patterns, Whitehead presents the following attributes of an ideal APM system (extracted directly from the author's article:

Pervasive: It monitors all application components and dependencies.
Granular: It can monitor extremely low-level functions.
Consolidated: All collected measurements are routed to the same logical APM supporting a consolidated view.
Constant: It monitors 24 hours a day, 7 days a week.
Efficient: The collection of performance data does not detrimentally influence the target of the monitoring.
Real-time: The monitored resource metrics can be visualized, reported, and alerted on in real time.
Historical: The monitored resource metrics are persisted to a data store so historical data can be visualized, compared, and reported.

Next Whitehead defines a technical solution that meets these requirements. He defines a set of "tracers" that are responsible for obtaining data from a monitoring component and sending it to a "performance data source". He defines the characteristics of those tracers, which include such facets as whether the metrics are based on an interval sampling, a delta, sticky (meaning that they do not change frequently), incident based, and smart tracers that automatically discover their type based upon the nature of the collected data. Then he reviews common collector patterns, such as polling, listening, and interception.

Whitehead dives deeply into monitoring specifics by reviewing the core JVM MBeans and constructs a monitoring framework for gathering those and application-specific JMX metrics. He subsequently turns his attention to monitoring classes and methods and reviews the four common technologies:

Source code instrumentation: manually adding instrumentation to your application
Interception: intercepting calls as they are made, such as through AOP, and capturing instrumentation metrics
Bytecode instrumentation: modifying the bytecode of an application at runtime to inject performance collectors
Class wrapping: wrapping or replacing a target class with another class that contains instrumentation logic

After demonstrating how to implement source code instrumentation (in part 1), he establishes rules and thresholds with which to evaluate incoming metrics.

In part 2 he turns his attention to post-compilation instrumentation. He reviews how to use the EJB3 interceptors, servlet filter interceptors, EJB client-side interceptors and context passing, and Spring interceptors to capture application performance metrics. He describes how to use class wrapping of the JDBC driver, connection, statement, and result set objects to instrument JDBC, and hence, database calls. And finally he describes how byte code instrumentation (BCI) works and how the JVM provides a standard mechanism for integrating BCI through the javaagent JVM startup parameter. To illustrate why APM vendors choose BCI over class wrapping, he presents the following performance chart:

Whitehead concludes his series by reviewing monitoring strategies for the ecosystem in which a Java application resides, namely the operating system and host environment, which includes databases and messaging infrastructure. He discusses the challenges and benefits of agent and agentless monitoring and then dives deeply into monitoring Linux/Unix systems and Windows systems. The next challenge he addresses is database monitoring and contextual tracing. He describes JMS and messaging systems and illustrates how to monitor them through a combination of synthetic messages and JMX. At the end of part 3 he discusses visualization and reporting and presents sample screen shots of visualization techniques, including dash boarding.

In short, this article series presents and in depth overview of performance monitoring and includes a level of detail that allows the reader to understand many of the technologies that may be taken for granted in off-the-shelf monitoring solutions.

For more information on performance and scalability, see InfoQ's Performance and Scalability page.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Infrastructure topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter