Bio Before joining AppDynamics Jim spent years on the user side of APM solving problems, fighting fires, and trying to convince all of his APM vendors that they could (and should) do better. His passion for performance tuning led him from systems and application administration to working as an APM Architect.
Software is changing the world; QCon aims to empower software development by facilitating the spread of knowledge and innovation in the enterprise software development community; to achieve this, QCon is organized as a practitioner-driven conference designed for people influencing innovation in their teams: team leads, architects, project managers, engineering directors.
Monitoring is evolving. It’s had problems for a long time. Somebody actually asked me a question this week (at QCon London): “Does monitoring still suck?” and in a lot of ways, yes, monitoring still does suck; however there are a few vendors out there that are changing that for the better.
AppDynamics is certainly one of those. One of the major differentiators of AppDynamics is the ease of use. So traditionally, monitoring tools have been difficult to use, difficult to deploy, difficult to manage and that management and deployment overhead is really prohibitive for most environments. I struggled with it myself when I was running a monitoring (app) for an investment bank and thankfully things are changing. Tools are getting smarter, they are getting easier to deploy. It should be like consumer software: the software should just work and we hear that comment a lot about our software when we go and talk to our customers. They tell us “Thank you, it just works; finally, enterprise software that just works”.
Absolutely. Traditional applications are very simple. They’re typically three-tiered architectures, even service-oriented architectures, which are much easier to monitor than Cloud based applications. What we typically see with Cloud based applications is they are quite expansive, they scale horizontally.
We have some customers that have gigantic Cloud based implementations – Netflix being one of them - that has about ten thousand JVMs for a single application. At that scale, it’s really difficult to monitor and manage so many individual nodes. A few years ago, monitoring and managing one hundred to two hundred nodes was difficult, even five hundred nodes, within the past couple of years, has been difficult.
So we’re seeing this explosion of the horizontal width of applications, the horizontal scale of applications and it makes it much more difficult to collect the metrics and to scale up the collection mechanism so that you can actually understand all the information that’s contained within, identify issues. It also makes it much more difficult to manage the deployment of agents. You have to have intelligent technologies, you can’t tell a monitoring tool what to monitor, it has to know it automatically and scale itself with Cloud implementations. So it’s much more difficult than the simple architectures of just a couple of years ago.
The typical problems that we see are issues of silos. Everybody has their own sphere of influence, has their own set of tools that they look at. Most of them happen to be infrastructure monitoring tools: so you get operations teams that only have access to infrastructure tools, you get development teams that only have access to their development tools and they are not speaking a common language, so when there’s a problem that requires both developers and operations to resolve, they don’t have a common platform to utilize to solve that quickly. With AppDynamics, we have both infrastructure and application monitoring in a simple pane of glass. So we detect any anomalies we see, all of the data points from an infrastructure perspective, all the host level metrics that are hosting the applications and then we actually drill down into the deep levels of code execution within the application. So we bring development and operations together in a simple pane of glass.
Some of our customers definitely are maturing rapidly, with regard to DevOps but I don’t see it happening quickly enough. I think it’s a great philosophy and a great strategy; I actually asked the question today to a room full of people, if they are implementing DevOps or a part of the DevOps strategy, if they ever participated in that strategy and only a few hands went up in a large room of people. So, certainly not the level of adoption I would like to see but I think in the next year to see much stronger adoption.
Sure, so big data is not in and of itself enterprise data bloating but enterprises keep a lot of data around. They keep log files, they keep e-mails, they keep business data, infrastructure data, monitoring metrics. I believe there’s too much data being kept in most organizations and it’s prohibitive to solving problems quickly, particularly when it comes to monitoring. “Big Monitoring Data” to me is a bad thing. When you keep around too much information, it actually clouds the picture of what’s really going on and it makes it harder to resolve that business impact in a timely manner.
Absolutely. It’s not a matter of affecting the performance of the application itself, it’s a matter of restoring the actual application performance within a reasonable time frame. You want to restore that as quickly as possible. You love to avoid it in the first place but it’s going to happen at some point and with too much data, you have too much noise, too many things getting in the way of the relevant data points that you need access to.
Smart Data is the concept of being intelligent about what data you keep. So in order to resolve problems quickly, you need data that really gets transformed into information and there’s a big difference between data and information. Data in itself is the raw metrics. It’s things like transactions response times and CPU utilization and we have lots of different metrics that we can look at. Turning those raw metrics into information requires analytics and that’s really what Smart Data is. Smart data occurs at many different levels. It occurs out of the agent set of collecting the data, they need to be smart and ramp themselves up and down automatically. We don’t want to incur extra overhead when there’s no problems. We don’t want to collect too much data when there’s no issues but we do want to collect enough data to have a baseline. So Smart Data is all about applying correlational analytics to our data points and turning that into actual information.
Let’s really talk about monitoring solutions in general to get to that answer. When you’re monitoring an application, you really need to monitor from the end user all the way down to wherever the application terminates inside of your data center in your infrastructure. You need to monitor that entire flow, the whole way and be able to identify if there’s any issues along that transaction flow. For real user monitoring, we look at in the pages that the user is accessing. We also look at business transactions: a page and a business transaction are not the same thing. The user can hit a web page and have to log in to an application. Well when it clicked the log in button that kicks off a business transaction and that business transaction is really following that log in through its entire life cycle through your infrastructure and timing it every step of the way. So we need those end user experience metrics, the real user monitoring metrics, we need to understand the response time that the end user sees, we need to understand the response time within the data center and every point along the way. There’s a ton of different metrics we really could talk about that are related to all those different segments but when it comes right down to it, if you don’t have the core pieces in place than you’re not monitoring properly.
There can be, yes. I really believe that if you’re doing monitoring properly, you are monitoring the application from the end user perspective and from the IT perspective. In order to do that you need to be inside of the application itself. So we would deploy agents up into the application server nodes. Once you’ve deployed an agent there, if you are over instrumented, if you are looking at too much information, too often, you can absolutely negatively impact the performance of that application. Our system doesn’t do that; our system actually has intelligence and analytics built in to avoid that, so we are constantly running our algorithms to determine out level of instrumentation and we automatically baseline performance. So once performance deviates from normal behavior, we ramp up our instrumentation level to get you the information that you need to solve the problem, to remediate as quickly as possible and then we dial it back down. So it depends on whose product you are talking about, the traditional products in the space didn’t have any type of intelligence and analytics built into their agents to do data collection. So that was one of the big problems with APM of the past: you could really do a lot of damage by over instrumentation, you could do more harm than good. But with modern technologies like ours, we’ve solved that issue.
We do very different things than Splunk does. Splunk is a log monitoring and analysis tool: distributed log monitoring analysis. Splunk collects logs from all over your infrastructure and you can do a distributed search on them and look for errors, you could setup events and alerts, if certain things are detected in the logs and that’s great. It’s a very useful technology, particularly for leading edge technologies. There are usually no monitoring tools in place, when a new technology comes on to the market, so you’re left with log monitoring and it’s fantastic that you can monitor whatever you need to. Where that falls short is in actually figuring out what you problems from within the code. So application performance monitoring tools are actually inside of the application and they understand the runtime execution of the code and they put together the picture of what your application looks like at any given time. You can’t do that with a log monitoring tool. They definitely work well together; they’re great tools that when you integrate them, are very powerful together. If you can drill down from the APM tool into the log monitoring tool in context, you can dig up some extra data or vice versa. The integration that we have right now with Splunk is our events console and all of our messages are inserted into Splunk and when errors show up, or events show up in Splunk, you can click on them and open them in context in AppDynamics right away. So you don’t actually have to go through the whole logging process; it’s a much faster way to get straight to where you need to be.
AppDynamics 3.7 is actually a big release for us, we’ve got a lot of great new functionality coming out in this release. I’ll just hit on the few of the key pieces, the first one being mobile application monitoring. In 3.7, we’ll actually be able to monitor the actual mobile web browser. We’ll see all the activity coming from that mobile web browser, we’ll understand exactly which browser it is, what versions of your mobile operating systems you’re using and in future releases, we’ll actually go into native application monitoring. Our initial release sets up the mobile monitoring space for us, but in 2013 we should have full mobile native application and mobile web browsing coverage. That’s one of the features. The second major feature is fantastic, it’s the one I’m really excited about actually. It’s Application Runbook Automation (RBA). So RBA is being able to automatically run scripts, run processes, to take some sort of action. Application RBA is taking those actions on individual application nodes, based on what we know about the actual applications themselves. So AppDynamics is in a very unique position compared to any other RBA vendor that’s out there. We’re actually in the application, we understand exactly what nodes are being used at any given time and what nodes are impacted at any given time. With Application RBA we can identify if there’s an issue and then isolate that issue to the given nodes and then fix that issue, potentially before an end user experiences any problems whatsoever. So with AppDynamics, we’ve reduced our customers’ NTTR from hours to minutes. With Application RBA we’re going to reduce the minutes to seconds. The last major thing I want to talk about with 3.7 is our PHP support. We’re going to announce that support soon. PHP is going to be supported just like Java and .NET for us right now. We have the same AppDynamics infrastructure, the same features and functionality, it’s just applied to PHP. So PHP has become the third most important language in enterprise and we need to support that.
I’d say so, not so much at a fundamental level but at a very specific level. Mobile devices are the end user experience, they’re the real users accessing systems and we see that broken down into three main categories. We see mobile web browsers accessing server side applications; we see native applications that have most of the logic contained within that native application and then we see hybrids where we have native applications that have embedded mobile browsers within them, so there’s logic potentially out at the mobile device and within the data center that needs to be accounted for. So tracking that end user activity and correlating all of that across - from the mobile device, all the way through the end point in the data center - is critical to really understanding the end user experience and to solving problems that those end users see on their devices.
On our website www.appdynamics.com . We have a lot of great information there. Visit out blog on appdynamics.com to see the thought leadership in general. I tend to write blogs that don’t really get too specific with our product but are more based on my experience from enterprise monitoring and management and we’ve also got a user community that you can access there as well.