Jonah Kowall on Application Performance Monitoring and Management
Application Performance Management (APM) focuses on monitoring and managing the performance and availability of software applications running in an enterprise. The goal is to monitor, analyze, and report on the performance of applications so the IT teams can quickly identify issues and resolve production problems and ensure quality of service (QoS).
With the emergence of Cloud Computing technologies and more and more business services hosted on the cloud, it has never been more critical to monitor and manage all components of a typical application. Charles Babcock recently wrote about why APM systems matter now. He said that a modern APM system requires the ability to visualize an application and its dependencies, compile statistics, perform real-time analysis to detect anomalies and perform diagnostics to troubleshoot the production problems.
Jonah Kowall from Gartner team has co-authored the Gartner Magic Quadrant report on Application Performance Monitoring last year.
InfoQ spoke with Jonah about the APM space, techniques and tools and emerging trends in this area.
InfoQ: Application Performance Management (APM) is more than just monitoring the applications and systems. Can you talk about some features that an ideal APM solution should provide?
Jonah: Gartner defines APM as a five dimensional model, each buyer may not need all of the dimensions to meet their requirements. There are solutions which meet a subset of these dimensions better than solutions which meet all five dimensions. In order to be included in the APM magic quadrant you must meet all five. Here are the dimensions:
- End-user experience monitoring (EUM) — The capture of data about how end-to-end application availability, latency, execution correctness and quality appear to the end user.
- Application topology discovery and visualization — The discovery of the software and hardware infrastructure components involved in application execution and the array of possible paths across which these components communicate to deliver the application.
- User-defined transaction profiling — The tracing of user grouped events, which comprise a transaction as they occur within the application as they interact with components discovered in the second dimension; this is generated in response to a user's request to the application.
- Application component deep-dive — The fine-grained monitoring of resources consumed by, and events occurring within, the components discovered in the second dimension (runtime application architecture discovery). This may include server-side components, and client-side devices and interfaces.
- IT operations analytics — The combination or usage of techniques, including complex operations event processing, statistical pattern discovery and recognition, unstructured text indexing, search and inference, topological analysis, and multidimensional database search and analysis to discover meaningful and actionable patterns in the typically large datasets generated by the first four dimensions of APM.
InfoQ: Cloud computing is becoming the popular way of deploying applications. How does the monitoring of applications running on the cloud differ from the traditional monitoring? How is it different when using a hybrid cloud environment?
Jonah: First we have to define and understand that there are 3 types of public cloud which are IaaS, PaaS, and SaaS. Most products can be deployed in remotely connected locations, which covers IaaS or even your own data centers which are distributed. The issue comes when you start looking at PaaS and SaaS, there are products which integrate with PaaS providers. Those solutions are normally delivered via SaaS themselves, but it’s not necessary. When looking at monitoring SaaS services the current products are very limited. There will be innovation in this area through 2013 and we should have some solutions on the market which can monitor SaaS performance. Currently we have to use synthetic tests, which are a measure of availability and not performance.
InfoQ: Another area that's getting lot of attention in the software development and management area is DevOps. How do the recent trends and innovations in APM space help developers and operations teams to work together and have a continuous performance monitoring environment in their organizations?
Jonah: The sharing of information and collaboration are two key areas of DevOps, APM provides the developers, line of business, and quality assurance organizations with data. This data can then be used to troubleshoot, or answer other questions. Some APM products do very granular performance comparisons between releases, allowing for a better understanding of code quality as code is rolled out. APM products can be used in development, testing (performance and regression), as well as production environments. Most Gartner clients tend to use APM in production primarily, but also other parts of the code lifecycle. This is not always because of cost, which is a factor, but also reflects the reactive and firefighting maturity level of most IT organizations.
InfoQ: Monitoring of applications and systems create a large amount of data that needs to be parsed and analyzed in near real-time to detect the anomalies in the application run-time behavior. With technologies like Big Data and Analytics getting lot of attention, how can APM take advantage of these technologies?
Jonah: There are not really look which can “fix” problems in the runtime today (aside from 2 or 3 examples, which are tied to specific platforms). APM tools normally use various aspects of IT operations analytics capabilities to better understand and analyze the large volumes of data collected from application and network instrumentation. We break these capabilities into the following categories :
- Complex operations event processing (COEP)
- Statistical pattern discovery and recognition (SPDR)
- Unstructured text indexing, search and inference (UTISI)
- Topological analysis (TA)
- Multidimensional database search and analysis (MDSA)
InfoQ: What are some limitations of APM tools and products you would like to see improved?
Jonah: Ease of use has been improved across the board, but there are far too many tools which are difficult to implement and use. Getting agents and other capabilities to be further embedded within product offerings from the application server or infrastructure perspective will make implementation and telemetry data far more easily accessible. There are companies with this opportunity, but they will likely not be released until 2014.
InfoQ: Can you talk about the upcoming trends in the APM space?
Jonah: We expect to see significant growth in the mobile APM market, in addition to it coming to become a true APM market versus the synthetic and limited implementations we have seen to date. I also mentioned SaaS monitoring, this is another area which will be attacked in 2013. Additionally we expect to see analytics continue to rise and differentiate APM offerings, those with analytics will continue to distance themselves and provide strong advantages.
About the Interviewee
Jonah Kowall is a research director in Gartner's IT Operations Research group. He focuses on application performance monitoring (APM), event correlation and analysis (ECA), network management systems (NMS), network performance management (NPM), network configuration and change management (NCCM), and general system and infrastructure monitoring technologies. These technologies are the foundation of operations, and they exist to make incident, problem and change management possible for these teams.
oversight in what makes cloud so different
This requires a whole rethink of what is means to monitor and manage applications. That is not something an analyst is going to report on because if they knew they answer they would probably be doing something very different on the other side of the table.
Google has already show in a recent article on performance variability that performance can't be treated as an after thought and in fact it is integral to the design and operation of an application.
The cloud is so different than typical enterprise data center management in scale and agility that monitoring needs a far more radical overhaul for APM to have a shelf life (or half life).
Re: oversight in what makes cloud so different
The big difference here is that we are talking about management via control not monitoring via whatever operations think it can do which usually means kill and restart.
RUM, RUE for Mobile
One of the reasons for this emphasis, in Mobile, is the nature of the Wi_Fi networks with dropped packets extra latency etc that real devices experience and are hard to emulate, although WAN emulation software is advancing.