BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Jon Travis on Hyperic HQU and Monitoring with Spring Insight

Jon Travis on Hyperic HQU and Monitoring with Spring Insight

Bookmarks
   

1. My name is Ryan Slobojan and I am here with Jon Travis, principal engineer at SpringSource. Jon, can you tell us a little bit about HQU?

Sure. Probably it would be good to go back and talk a little bit about the foundations of HQ before talking about HQU. So Hyperic HQU is a management platform that was originally written on JBoss, using Struts and Tiles and a kind of mixture of a lot of different technologies. It was written a long time ago and since then there’s been a lot of advances: we've moved it on to Hibernate, it's moved over to Spring, so a lot of work has been done to make Hyperic HQ a little bit more of an agile product. But in the middle of all that we needed to solve some customer issues. We had customers coming to us with extremely large deals and as a startup it’s kind of difficult to say No to those customer because you need their money. So what we had come up with is a kind of a stolen idea from the Rails camp which was a really simple model-view-controller framework.

So what we did was we embedded Groovy inside of HQ and created a plug-in architecture called HQU. So what the plug-in architecture provides us is things like being able to dynamically create new screens in the application, it provides really nice facilities for being able to query status of the application or things like: "what is the metric data for this box, like how much memory is on this box currently, what is the CPU load over on this box?" Traditionally we had to ask session beans that information, we had to get a handle on a ton of different objects and then pass them into a session bean and then have the session bean spit out the result. Using the power of kind of the Groovy’s categories and being able to dynamically add methods to those POJOs at runtime we were able to make it really easy and obvious to answer those questions. So we were able to say: "Alert definition, give me all of your alerts". Ordinarily that’s something we would have had to ask the session beans, so it reads naturally in the code, it’s very easy for people to understand and we could basically create a suite of these plug-ins and then give them to customer and they can modify them themselves. So really what HQU did for us it allowed us to say Yes to a lot of our biggest customers because the work that we needed to do for them was isolated in a very small chunk of code and it was kept apart from the rest of the code base. So obviously that meant it would scale a lot better.

So HQU provides a lot of things; it provides web services; there is a new initiative that happened over the past few years called HQ API. HQ API is a web service based access into HQ management platform and that uses an HQU plugin on the server side to handle those requests so if you are writing your own custom web services that would be a good way to do it. It handles its own rendering so it can attach at different points in the UI and add new menu items and add new places to display content so for users that want to display a dash board of things that are very specific to their architecture like for instance if they don’t like the red light -green light, maybe they like red light-blue light-green light and they have more states or they have different ways of looking at their data so they can use that in HQU, HQU will allow them to create their own screens. And then finally we use for its templating capabilities so when a user decides they want to get an alert when they are running out of disk space they may want to know additional information like what rack the box is on. So what we allow them to do is use Groovy GSPs to template all the alert emails that come out. So I’d say HQU and Groovy... I mean, it's not synonymous with Groovy but it uses Groovy exclusively and to great benefit. It really did add lot of benefit to the HQ team.

   

2. What are some examples of HQU plug-ins that you see your community creating?

We’ve had plug-ins for people doing things like they’ve got their own view into their data center, something that’s grown in house and they want to embed content from HQ in their dashboard, in their own Wiki or their own dashboard and one thing that they can do if they develop and HQU plugin is to dig out the information that they want so that they can expose it easier in their dashboard. So we’ve seen a bunch of those things as well. SpringSource originally OEM'd the Hyperic HQ software under a name of AMS. AMS was basically HQ plus a lot of functionality around managing tc Server, so being able to deploy tc Server instances, restart them, give thread dumps, these kind of thing and it’s all through this central HQ console and the way they were able to do this was by adding HQU plug-ins they were able to create new screens that dived into HQ’s inventory model, was able to navigate to those boxes, interact via the agent to those tc Server instances and so I think they used it to great effect.

   

3. Hyperic HQ uses an agent model, if I recall correctly, to do a lot of the communication and the automatic discovery. Can you describe in a little more detail how that works?

Generally a lot of the functionality you need when dealing with provisioning or monitoring kind of low level details requiring you to have an agent on a machine. So in some cases network devices like a switch or whatever they can be monitored remotely and there is no need to install an agent on them, but in some cases where you need to install software or you need to get low level information that is only available by running a process on the machine you need to have an agent. So the process typical for a user to get started with HQ is pretty painless. Once you’ve installed HQ server and you want to start monitoring some things, you just fire up the agent, the agent asks you for the location of HQ server, once you give that information it will start running an auto discovery scan of the things that are on the system that it can locate. So as soon as you start it up it’s going to try and locate things, it will find MySQL and Tomcat and JBoss and all kinds of things and then ship that information back over to HQ and start collecting metric data on it.

   

4. What kinds of data is provided by the agents?

The agent provides a vast amount of data. There is a plug-in architecture that we use to accommodate; I think there is more than 100 different products that are in there, many different flavors of WebSphere and we’ve had request for all kinds of things, so we can send back the count of disk accesses, the number of page faults to how many users are logged in. It also sends in a bunch of non numeric data, it'll send back a log tracking events or configuration change events. So when someone logs onto the box you can get a notification on your central console and get alerted to that if you want. If someone changes a configuration file you can get alerted about that as well, so it really is our foothold into being able to give the users everything that they are going to need in that central console.

   

5. Do you see some of the agent’s capabilities that you described as being replacements or as supplementing existing Unix administration tools like Tripwire?

Yes. I definitely think it replaces that; really what you want to have is a central console that can give you the overall health of your application and your infrastructure and can alert you when things are going down. Really you want one central place where you can configure your roles based on who wants to receive alerts when a configuration file changes or who wants to receive alerts when we’re out of disk space. Are these the same people? HQ has a follow-the-sun model where different roles can come into activity in different times of the day, so if you are running a global operation this is something you are going to need. I think it’s good to have it all definitely in a central console for sure.

   

6. What is Spring Insight?

Spring Insight is a technology we’ve recently developed. It comes with tc Server developer edition which was just announced at the SpringOne2GX conference and it’s really targeted at developers in QA who are really interested in getting low level information about what their application is really doing under the hood. So the easiest way to describe it is say you are developing an application, you are working on a screen and my screen that I am working on deals with finding a bunch of books in the library that match a query. So this screen is going to run, the request is going to hit the server, the controllers are going to work, make some queries, render the result. And what if it’s slow? Say I am QA developer and I am accessing the application and I see that this page is really slow. It took half a second or a second to render. That’s a long time. What Spring Insight allows you to do is it gives you a rear view mirror; you could say: "What just happened, why was the screen slow?" So Spring Insight behind the scenes it’s kind of recording all the information about what your application is actually doing under the hood.

So if I go to the library application and I say: "Find all books written by Mark Twain" and my find that it takes a second I can go back into the Spring Insight dashboard and what that will show me is: "Well, here is all the queries that happened, here is all the exact JDBC queries that occurred, here’s how long each one of them took, here is all the controller methods were executed and the transactions that occurred" and it allows you to isolate exactly what the problem is with any specific request. So it provides really fine grained details about what effect a request had and it will give you transaction committed, it will give you the demarcation of the transactions, it will give you a full breakdown of all your HTTP headers and response information, the size of the response. So it really gives you a lot of information on a fine grained, kind of request based level; that’s kind of first bit that it does is give you this fine grained piece of information. The second thing that it does is also gives you the 10000 foot view into your application. So typically it’s not so simple to find out what is slow. Typically you are not just browsing along and you find one page is slow, right? Because when people develop pages they develop the page and it’s fast. So typically how people see this is in production or they see it in a stress testing environment when they are doing performance testing. So we give them a 10000 foot view into the application.

What this means is, say you start up your application and you fire a JMeter load at it, serve up thousands of requests and then you go into the application Spring Insight dashboard and you say:"Spring Insight what was slow?". And it will say here is the controllers that were quick or slow depending on... here's the rank of their health, it will give you kind of a color coded health as well as 99% response times that kind of stuff. Here is what most of your users experienced, here is the trend of the response time. So it will kind of give you aggregate level information at 10000 foot view. With that what you can do is you look at the response time and you see that the response time of one particular controller is slow. So once you drill down into that it will say here is the distribution of all the requests that occurred from this specific controller method and from there you can drill down into an exact, you know, the trace that we talked about before which is the fine detailed information so you can drill all the way down into detailed information and say: "Ok, here is a specific request that exhibited the characteristics that we're describing here, that caused it to be slow." So you can run a load test and go from the most broad 10000 foot view all the way down to the most granular view in only a few clicks, so it’s really going to be pretty valuable for developers.

   

7. How does Spring Insight collect information on running Java applications and what information is collected?

We have a plug-in architecture for Spring Insight that uses a variety of ways to collect information, so the primary way is AspectJ weaving. So we use the load time weaver in Tomcat so that when the user’s web application has started up, we kind of weave our logic into their application as it's started up to kind of collect the data. And so those are dictated by plug-ins. The other way that we collect information is through Tomcat filters. So one of the ways that we collect a lot of the HTTP request information is just by a Tomcat filter. So once you set up an Insight instance that basically just comes with tc Server this is all kind of all prepared for you, it just uses AspectJ to kind of pull that data out. The data that it pulls out is kind of a per plugin basis. It can decide, I am interested in the transaction commit status or whatever they are actually interested in.

   

8. There is usually a cost associated with collecting information from an application while it’s running. What kind of overhead does Spring Insight cause on the application that is monitored?

There is two ways that Insight can kind of impact your application. The target user for Insight is the developer and we really want to be as little of a burden as possible on the developer and so with this requirement we don’t require them to set up a database to store this information, we don’t require disk access to actually be written. So we are actually collecting all this data in memory, so one bottle neck is that we are going to be filling our memory with a lot of stuff, so one way to alleviate that is we’ve got some pretty smart algorithms about being able to purge a lot of the data from memory and there is also some configuration options that you can tune to keep the memory footprint low. So that is one way it impacts your application, it eats up additional memory; you are going to need additional memory to actually use the application. The other way that it can impact the performance of your application is with the AspectJ weaving and the actual work we do behind the scenes to record things. So if your application is strictly CPU intensive and doesn’t really do any IO operations, it doesn’t access the disk or the database then it’s possible that the overhead could be significant because it’s all CPU bound. But for 99.9% of the people that are developing applications on TC server you are going to be hitting a database, you're going to be writing a request to a socket, you are going to be doing IO operations and that is going to make the impact of AspectJ pretty much negligible. So we’ve run it through the Profiler and really the AspectJ code and the Spring Insight code pretty much never shows up as any kind of issue.

   

9. You had mentioned that Spring Insight has both the drill down and the 10000 foot view capability. Is it possible to separate those out and can Spring Insight be run on a production server?

Spring Insight is a new product and definitely needs more maturity before we are going to recommend people run it in production and right now we are pretty much strictly saying it’s not a very good idea to run it in production. Not only does Spring Insight have access to very sensitive information such as any HTTP headers , if there is passwords or anything like that, that is all going to be available to Spring Insight and there is no login to Spring Insight, so if that dashboard is deployed on your production instances you are going to have issues. So basically it’s not that we don’t recommend it, we say do not run it in production. That being said it’s something that people ask for, people want to see this in production, people like this information and they need to see it in production. So it’s definitely on the road map, it’s something that we’d like to target, but the bar for production is high whereas the bar for developers is low and Spring Insight really gives useful information right now so we wanted to get this out to the community and tell them production is not something that is possible right now but we still think that you can get good value out of the product.

   

10. How does Spring Insight integrate with the SpringSource tool suite?

Sure the SpringSource tool sweet is the developer’s central console for interacting with the entire Spring framework and family. So Grails, Roo and any of the Spring beans, all of these are available through STS, the SpringSource Tool Suite. One thing that is new to SpringSource Tool Suite is Spring Insight and what you can do there is, let’s say you are editing your application in Spring Source Tool Suite which is an Eclipse based editor and very convenient for editing your application. When you want to try it out you need it to deploy it to a web container and it comes with an embedded tc Server instance. It only takes you 2 clicks to run your application on the server and an additional one click if you want to run it with Insight. So if you run it with Spring Insight enabled you have the web browser pop up right inside of your STS to your application and you can have a web browser pulled up in STS to Spring Insight dashboard all at the same time. And what is cool about this is that when you are diagnosing issues inside of Spring Insight it will have a little link that shows: "Go to STS" and with that as soon as you have navigated through the frame stack you decide this JDBC query is a problem or this transaction is a problem. Once you’ve isolated that you can click "Go so STS" and all of a sudden what will happen is your STS will load up that file and take you right to the place that you are pointing at. So it kind of tries to complete the whole roundtrip of what we are promising for developers. The developer edits the code in STS, they deploy it to Spring Insight, they evaluate performance conditions and then they roundtrip it and come back to STS to actually fix those issues.

   

11. What are the future plans for Spring Insight?

We definitely want to kind of get it out to the community and get feedback on issues people are having. We want to make sure it runs on as many applications as possible so that as many users as possible are able to use it and as soon as we introduced it the first question is: "How do I write my first plug-in for Spring Insight? How do I customize it and get information that is particular to my application and compose it or provide this information to my users?" For a lot of the framework developers are asking us for access to a PTCase so they can expose things; for instance Spring Batch, how long does it take to do a job. So I think in the immediate term we are talking about really needing to get community feedback and develop additional plug-ins so that we can kind of support the things that people are really requesting. Longer term we would like to see integration with Hyperic HQ product so that from the central console, people use HQ for monitoring their production instances they also use HQ for monitoring their performance testing instances or the rest of their infrastructure around their regular business. So one thing that we can do is integrate where Hyperic HQ can get that information from Spring Insight or allow you to navigate to the console; so we want to do some Hyperic HQ integration work there and then eventually long term it would be good if we got it into a production capacity mode, obviously that is a little ways of and right now we are targeting the community, but that will be the long term goal.

Jun 07, 2010

BT