Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews Billy Newport explains Virtualization

Billy Newport explains Virtualization


1. This is Floyd Marinescu here with Billy Newport. Billy can you introduce yourself and tell us what you are up to these days?

Hi, I am Billy Newport, I work at IBM. I am a distinguished engineer and I am the main architect for high availability, clustering and persistence like JPA, CMP technologies and things like that, connector technologies JDBC and JCA and for the ObjectGrid which I ran, is a network attach/XTP component that’s available as part of WebSphere XD. Obviously, I am involved in WebSphere XD as well which is our WebSphere eXtended Deployment, which is our application virtualization platform that we sell as well.


2. So Billy, for developers and architects who have heard of virtualization, but haven't really been following it much, now what does it mean to them?

Well, virtualization is an interesting topic basically because there are lots of levels of virtualization that we are seeing now, levels from request virtualization down to application virtualization down to machine virtualization like hypervisors.


3. what are the different levels of virtualization and who is playing in those areas right now in the industry?

Well, at the top level closest to the application developer, are things like request virtualization in terms of reordering, request based SLA and things like that and data virtualization where products like IBM DataGrid/ObjectGrid or GigaSpaces, Coherence and products like that. Then below that thing you got a virtual machine virtualization, what I mean there is like which applications should be started and stopped based on current loads and SLA's that are put in place and like people don't really want to have to worry about this as a manual process, they prefer it to be automated, so they preferred to have some middleware that starts and stops virtual machines on demand so that if they need application A is slowing down then they want something to automatically stop B on a different box to free up some CPU and then start A on that box so that all the requests for A get more and more CPU time and get processed and then beneath this we have operating systems. Operating systems are virtualized by hypervisors where hypervisors are basically defined as, like we said before, standard machines with standard reference architecture, standard everything so that the operating system virtual machine is completely standardized basically because it only has to run that one type of server, the virtualized one so that improves tests and things like that and obviously in that space you got products like p-series hypervisors, you got i-series using the same hypervisors from IBM, you got IBM mainframes z/OS, you got under distributed side on the microprocessor road, you got VMWare, you got XenSource as well and you have got various licensees of XenSource like Oracle is into the game now and so on.


4. So introduce also what a hypervisor is?

Well a hypervisor is, in the beginning people had computers where processors, memories, ROMs, RAMs and you wrote to the hardware and then they had BIOS'es because different serial cards have different register sets so they made a BIOS to make them all look the same and that made writing operating systems easier and then we have been in that mode now probably since the days of CPM until very recently where what we start to do now at least in the distributed world, by distributed I mean the micro-processor based world like not mainframes and i-series things like that, is now we have introduced hypervisor technology which means that the hypervisor pretends to be a computer, so it pretends it has a particular type of CPU, it pretends it has certain types of emulated I/O devices like serial cards, video cards, network cards and storage controller cards for controlling disk drives things like that and then it can run that pretend machine as a process if you want, like an application on a box with an operating system and then that is cool because for programs running within the virtual machine as far as they are concerned they are running on an actual machine with that actual set of hardware that was discussed, and then the hypervisor's job is to basically share the physical machine between one or more of these virtual machines so that you can consolidate multiple servers that were running on different physical boxes before on to one box to hopefully save them power and money and possibly lower the TCO in data centers. So, that is kind of what hypervisor does, so hypervisor basically makes it a pretend machine and you can run several of those and they are all isolated from each other on a single physical box, so that is the hypervisor's job.


5. Isn't there a performance hit when you are virtualizing the hardware stack and how do you performance tune when you don't really know what you are running under the scenes?

Absolutely, because I mean there is software in the way where it used to be talking to hardware and the other thing that has been overlooked by lot of people is, it's very hard I mean in the old days, in the old days means last year, people would know I am going to run on a 4-way box and it has L2 caches of this size on the processor and then I would optimize my JVM to run on that box by figuring out how many threads I needed, do I need to pin the JVM to a specific processor or run several JVMs on it to get it to scale up and things like that, but I mean that is kind of hard to do when you are running on a hypervisor and the amount of CPU you are running on can change from millisecond to millisecond, I mean how do you tune for that, so like I think we are saying in the beginning the thought was that the hypervisor would dynamically take processors away and give them back but then the problem with that is while the hypervisor can add a processor back so for example if I was running a database on a box or an application server to make it relevant to what we do, so with an application server running in a JVM on a box and it's tuned for so many threads and so many this and then the hypervisor because it's running on say a 128-way box for example right.

So it decides to give it 100 CPU's who told the JVM that we are running on 100 processors now, I mean who told the JVM to increase the size of all the thread pools. Who knows that the JVM will even scale to number of threads it would take to keep 100 CPUs busy. So like it complicates things as well because in the past we had absolutes when we came to tuning for a specific box. In the virtualized worlds, those absolutes are not really absolutes anymore at all because we have no idea what we are running on, we might be running on the latest Intel quad core box A but then we are running on a different box with AMD dual core or another box we are running on something completely different or older boxes so it is very hard to tune software now, and I think what will happen is like what we have already seen Amazon's with their Elastic Cloud project, they don't give you arbitrary size virtual machines, they give you a one virtual CPU machine that is the same like a 3 gigahertz Intel CPU with 1 core with 2 giga memory and 160 gig disk and that's it and they give you units of that size so now that is cool because you know what it is and you can tune for it and it may actually be running on a 10 gigahertz processor or something else but they are giving you the same number of MIPS and then in the same way they are selling 2-way versions of the same thing.

Again with some virtualized notion of number of cores and metrics and they have 4-ways as well. So I think probably what you will see is that these larger boxes, that these multi-core boxes and things like that will be divided up into standard size units which you can put together on like a jigsaw and you know that I have got like 32 cores which means I can have, you know two 4-ways, an 8-way and couple of 2-ways and then people would still be able to tune both the operating systems which is important as well as their virtual machines running on them so that they can make use of the virtual resources they are being given and perform at a reasonably acceptable level and then the way to scale it up won't be to add CPUs to a given virtual image, it will be to start a virtual image with another 4 cores or with an another core something like that so I think that there are lot of complexities I mean if you try to choreograph this you will have to have a way when you add a virtual CPU to the hypervisor that the operating system would have to be told by the hypervisor that that's happened and the operating system don't typically expect the number of CPUs to change on the physical box they are running on, on the fly and the OS would need to inform the processes running on the box that that's happened and then the JVM's will have to inform the WebSphere that it has happened on the box and then, you might have to, suppose that happens to a database box would you then have to have an event that the database sends to the application server to say make a connection pool bigger because I am faster or change your lock time-outs because it's very complex when you have to choreograph tuning a system to really run at a very fast rate. So having these standardized, you know sizes if you want, virtual machines is going to be important to think to have a way to have deterministic performance you know without having to reinvent rocket science to try and make something work. I think that's what will happen performance wise.


6. So is this the future, is the datacenter of the future is going to be all virtual or is this something that is really useful in a particular context but not in another context?

Well, I think if VMWare have their way it is absolutely going to be virtual, I mean they are already trying to push the equivalent of a modern BIOS, in that it's a hypervisor. In the past the BIOS was what insulated the bootstrapping from the various hardware, the box were running on but I think these days more and more we're seeing more things about Xen and you know the VMWare doing deals with various box vendors to ship a simplified version of the hypervisor with the boxes so that you know when you get it it's a standard box, a standard virtual box, you install on the virtual machine and then obviously they will upsale you to embedded hypervisors and management software later on.

I think that's going to happen because customers are demanding simplification so they can reduce cost and improve the quality and standardization of the hardware by having a fixed set of hardware resources comprising of virtual machine is a good thing because it means that the customer would be able to move virtual machines in one box to another without having to worry about reinstalling Linux or relinking the kernel or stuff like that and it means to them they don't have to have so many types of images because it reduces the amount of testing by a huge amount because now they can just built one Linux image and it will run on, they might have a Linux image for, or they might build say a WebSphere image so it has WebSphere ND 6.1.9 optimized to run on a virtual 4 core you know virtual machine and then that's what they will deploy and it is pre-tuned for you to know four core might need some tweaking depending on the kind of applications it is running on it but in general it will be all set to go for a four core VM and then will just have as many as of those four core VM's as they want because these boxes are getting better across as well.

I mean like you are seeing quad core CPU's now with two sockets so 8 cores per blade as a medium end blade and the end of next year it might be four cores or four sockets and now you got 16 cores in a blade unlike in the past most application servers have been written to run on the sweet spot and the sweet spot up until this year was a two way wrapped around a 4 core box and they certainly now are going to go to a like a 16 way and you know then that changes things a lot as far as tuning goes.


7. Okay, so we talked about the hardware level or the hypervisor Level, let's move the stack up a little bit, tell us more about what is going on in the JVM virtualization space, what's IBM and other vendors doing, what's the opportunity there for Java developers?

Well it's actually complicated because the, I mean the virtualization besides the fact that you don't know what you are running on which can be addressed to a degree by having a standardized virtual machines of a certain number of cores and certain amount of memory that can get around the tuning issues the other problem is that the multi-cores coming in a big way and the processors are basically slowing down and that has all kinds of interesting things and thus I mean it's one thing to make a JVM that can max at a two way box, you know that's a piece of cake and you know that's easy to do and you don't need to be super skilled to do that but like if you talk about maxing out a 16 way and remember a 16 core boxes will be here sooner than you think may be by the end of the next year even and year after it there might be 32 cores but like up until this year operating systems even that could run on 32 ways of databases, were tuned by like you know rocket scientist level developers to be able to get the scaling out of those because of the kinds of things you need to run like 600 threads on a 32 way box and get linear scaling.

It's a phenomenal amount of work and optimization has to be done there are a lot of challenges coming for developers as far as you know running VM's on boxes that tend to have more cores because you know in a way virtualization is almost like a it's going to save them to a degree because rather than worrying about my physical boxes going to have 32 cores next year my physical box of 64 cores the year after, how am I going to perform on that box by changing it to where while your physical box might have 32 cores that might actually be broken up into 16 four core virtual machines which we know how to write software for so like you know conventional clustering techniques like web traffic round robin and routers that might actually give us a way to run multiple JVM's on standardized units which are 4 cores sort of like we can lower the cost of trying to figure out how we are going to take advantage of all these cores so like in as I said writing software that runs large numbers of processors is an art not a science and it gets very expensive to do it.


8. So moving up the stack you talked about hypervisors and hardware level or operating system level virtualization, what about Java virtual machine level virtualization, we hear about things like what BEA is doing with operating system can you tell us little bit about that space like who is playing there and what's going on?

The only two vendors in that space would be Azul, who run their VM on their massively parallel boxes like I have used those within IBM myself, really cool boxes and great diagnostics like we worked with them to run ObjectGrid on it, it was a 192 way box and like it was awesome because once you run on us,it has full speed hardware profiling so straight away they could tell me that line 124 of File X is the bottleneck and then we would go and fix it and run it again and then line 260 was the bottleneck we were going to fix it, it's extremely rapid to fix it we came across some interesting kind of issues with parallelizing the software to run faster and thus like in the beginning you think you are clever when you do sync blocks you know to do multi-threaded programming and of course sync blocks are critical sections which threads stack up in front of waiting to get into it to be able to do something.

So you try to have the next level of sophistication I say for most people is to switch to reader/writer locks, because now you got concurrent readers but only one writer and for some data structures that works well for increasing concurrency. But as you go up to extreme numbers of cores like in an Azul box, even that doesn't work because there is still a critical section to let the readers into arbitrate between the readers and the writers and that critical section actually slows you down. So then you have to use things like arrays of reader/writer locks as you might have five of them but and then readers can get a read lock at any of the five so there are five critical sections, so you have five times as contention. But the writers have to get a write lock across all five before they can enter the critical section so writers are penalized but they are in a minority but readers get five times more concurrency because it is five locks governing the readers.

So we had to do all kinds of tricks to get ObjectGrid to scale up and in the ends, we spent a couple of days doing it and we stopped at a 128 cores figuring the amount of customers we are going to have with a 128 cores is minimal so like we trying to move on to do the next optimization thing, and then the other vendor is BEA with their LiquidVM where they basically eliminated the operating system and they run a JVM directly on top of the VMWare hypervisor but even there, I mean they said they removed the operating system and the file system and TCP still goes thru a Linux instance on the box, so there is still an operating system there so I think we tried to do it internally and the problem with that approach is that there's a lot of skills for managing the Linux but when it doesn't work, how do you do diagnostics, how to do tuning where is all that for LiquidVM from BEA you know there is no skill in it at all and they could argue that there is no skill but the bottom line is that they are writing an operating system integrated with the VM so that skill doesn't exist in the market so it is an inhibitor to picking up that kind of technology and like given they need an operating system instance anyway then it complicates things like fail-over it is just not about moving one virtual machine from one box to another, it is a collection because you have got the Linux image and as long as you have it you will have to move along the operating system the liquid VM thing so it's interesting it's a novel approach at least for public systems that you can buy but it has some challenges as far as skill is concerned there are a lot of people who know how Linux works they can diagnose it, they can tune it but those people their skill sets are irrelevant because there is a difference they still need people they will end up as against deploying used.

They will complain how do I tune this, how do I tune the TCP set, how do I tune the file system all those things and it will end up becoming another operating system. I mean do you need all that stuff; it's not just as simple as you think, it's running on a box and it needs tuning and diagnostics and where's that coming form?


9. So then moving further up the stock, we have, you mentioned the data virtualization with ObjectGrid and Coherence and Terracotta and GigaSpaces, tell us a bit about that space and what's going on there?

Well that space is kind of fastest advancing probably I think, the virtualization is fastest well across but at the data level we have, I think it's broken up into 2 niches right now. There's the majority niche which I characterize as network attach caches which are groups of machines that are organized collectively as one logical cache so they can have as much data as fits in all the boxes in the cache and it's coherent to quote Cameron's words so like, that everybody that looks at the cache sees one version of every record because it's only stored in one place and they have quality of services like fault tolerance to replication, things like that and like that's very useful and there are standards around like JCache (JSR 107), which I am not a huge fan of I guess but then I think like actually the API is not important because I think caching in that kind of space with a network attach model is as much about a quality of service as it is about anything else and quality of service should not be something you have to code, the quality of service should be something that you buy as an add-on but don't have to change your application to get.

Like if I told you that if quality of service was more CPU's, you wouldn't expect to radically change your code to go from one way to two ways . If I wanted to fail your process over with conventional HA software like Veritas, if I told you, you had to change your application to do that you wouldn't be happy and I think people are looking for payback when they try to put caching as a quality of service and then memory costs money as well, so like which kinds of requests do want caching for, which kind of apps do you want caching for, so like wouldn't it be cool, if there was a way to have an SLA where we noticed that for this application the response time was beyond what was acceptable but if we could dynamically turn on caching and then speed it up in that way as another optimization that can be done automatically. So like what I am really interested in right now is ways to plug caching in a non-obtrusive way to existing software with no changes and I am looking for plug points, like for example mediations in an ESB, like if I am invoking a service and I've noticed that a given service is requested 3 times, the service might be to look up a customer profile, maybe I want to start caching that service call so that I don't look it up second 2 times because they already have it.

So plugging a cache in there would give you a way to do without changing any application, without changing the back end, without changing the front end and obviously for selling to a customer that's powerful because they can just drop it and go faster which is what you want. And then there are other kinds of architected points we can put in this kind of quality of service. For example most of the object relational mappers have second level cache plug-ins like OpenJPA from IBM or BEA, opensourced as Apache project, or products like Hibernate, so being able to provide drop in plug ins for that and then a big one that people kind of overlook is HTTP session management, I mean there's already standard API for that as simple as servlet filter but then get the benefit of a new virtualized data infrastructure storing your sessions rather than storing it on a file system which doesn't scale that well or storing it in a database which is an expensive place to keep session data and stuff so I think more and more if you listen to the Amazon guys like their CTO and or things like that people are talking about coding architecturally, writing applications that connect directly with the data, i.e. through JDBC, that's a bad way to design things, a better way to design things is to have a service that you call to get the data and in the advantage of that is that you can scale it up, so like a connection to a specific database.


10. like a memcache right?

That was an approach from PHP but that, but memcache has its limitations as well and that is not easy to make the size of the cluster holding the memcache bigger without rehashing all the data, although I have seen some people doing consistent hashing and approaches like that to be able to add servers without having to rehash all the data, and like for example products like ObjectGrid and GigaSpaces and Coherence they already have consistent hashing so adding a new box, only impacts an one-over N of the data, that gets migrated onto the new box and you're going again and stuff like memcache that rehashes the whole thing which is atrocious as you can imagine, performance wise if you have got a 100 gig of data and I mean it not going to work that well.

I guess for me, what I am really interested in is, obviously the API is important but this kind of market segment, what I think is most important is the technology has to be consumable in a very cheap manner to put it into existing apps that have existing performance problems, so they can be alleviated and potentially have the caching turned on and off based on SLA's and things like that you only cache what you need to cache, not just cache everywhere and network attach caching can also help save money but virtualization in that like imagine the problem I guess with boxes today is they are getting lots and lots of cores but the memory is not going up at the same rate as the number of cores so that before he might have had two cores with 16 gigabytes of memory, that will be a big blade, now you have got out like an 8 cores with the same 16 gigabyte memory so the memory hasn't really gone up because memory is still very expensive to buy and that once you start consolidating there is plenty of CPU power you know to run multiple virtual machines on that box, while there is not plenty of memory so if you have applications that are already doing caching and that they need may be an gig of memory to have an effective cache, well then if you want to push you might have enough CPU to put in 10 of those on a box but you may not have enough memory.

So something like a network attached cache rather than giving each process its own memory, you might put say 100 meg in each process on each virtual machine and then use network attached technology like ObjectGrid to join those 100 meg virtual machines into a gig or ten gig and thereby save the total amount of memory you may need overall, allow you to consolidate more applications on the same box because the memory constraint, usually not CPU constraint, so like there are all kinds of nuances but I think the network attached is the biggest segment as far as the opportunity or technologies like ObjectGrid, GigaSpaces, Coherence and like the other product in the market is XTP.


11. So the second type of data virtualizationXTP, tell us more about that

Well XTP is kind of a style of architecture that is becoming more apparent now, in that it tends to be intrusive whereas the network attached caching, we try to do it in an unobtrusive manner, so you don't have to change your applications ideally, you plug caching into existing and architected interception points and things like that. XTP is different because they were trying to design software that scales in an unlimited manner and that means that you can't just take an application and suddenly make it scale to a million transactions a second but it just doesn't work like that so, you have to have an application designed with a certain paradigm in mind and the XTP paradigm is like a partition data model, so you need to be able to split your data model up into a few partitions of independent chunks, you need to be able to route and request the events as per the specific partition and within that partition there should be processed against the local memory that contains the data for that process of high speed and then it may be replicated to a back-up guy that has a back-up copy and then if it fails, we tell the back-up guy, you're now the hot guy and then the requests start going there instantly so that as fast as we can detect it, it's all ready to go because the database is already replicated, there and then it goes straight away, so it's a different way of designing applications will be able to be broken up and partitioned like this, but it's safe to say that the one's that scale will be need to be, because they're just kind of no way around this and you can hide this and maybe and what we have done in the past is we have hidden it by having, you know, we told everybody make your application stateless and put all your state in the database, so then you can have a lots of applications in front, you could fail them, or they could fail over because they are all interchangeable.

They're all the same. The problem is you're pushing the problem somewhere else and where you are pushing the problem is into the database so that now the problem is how I scale the database because if the database is down, my whole cluster in the front is down, because if the database is down, they all need us & like the databases are typically expensive to scale up like even for example, such as Oracle RAC, don't scale that well, past a handful of nodes just because of locking, network contention, lot of factors, so like the, you really need a way to consolidate the application and data tiers into one tier, partition it, and then do the request routing to route the request to where the data is that it can be processed at the fastest rate and replicated the changes somewhere else for fault tolerance and you may write a sub-set of the data back to the database with write behind technologies in write back mode, where you write chunks back and basically use database almost as a log you know because you're doing all the processing in the front, you replicated memory partitions, you know application and the database is really there for when something super bad happens so that you have a copy and you can use it for reporting because it has SQL, well in things like that like in a big warehouse, storing stuff in memory is great for minutes, hours maybe a day, maybe a couple of days, but for storing data for posterity I mean you need it on a disk.

So like the databases are by no means going away, but for XTP type things, we have come to the limits of what's possible in terms of scaling the database up and maintaining response times, so now we are seeing an evolution to a split tier for persistence if you want and we have the application and data tiers co-located and partitioned, where routing based on request to get the speed and all the transactions happen between pairs of guys which I like to scale out linearly with no cost of partition but the problem is we still have to write back to a database for reporting and posterity so like you still see a two tier, but the way the tiers are working together is different and we are moving into a stateful tier for running the applications with technologies like ObjectGrid underneath which virtualizes the fact that you're running in this kind of weird environment, by hiding the fact that you have to do replication, by hiding the fact that things failing over, by hiding the fact that the application might be running on multiple data centers and ObjectGrid's job is to make sure that for every primary copy of a partition running in data centre 1, that the replica is not in the same data centre and just round-robin the primaries across all the data centers and put the replica somewhere else and the application developer shouldn't have to worry about it at all.

So the ObjectGrid is in the middleware of the application virtualization space, what we're trying to do is to take a complex environment like multiple data centers and LAN's between them with different network latencies and that lets you build an application through simple policies that can be deployed on something that starts off with a couple of blades on your laptop and scale up to a thousand machine cluster running across 3 data centers with the same application,so it's again virtualization hiding complexity, it's the same kind of a theme.


12. So you mentioned in the XTP case where your apps run on a particular paradigm that can get you almost infinite scalability. That sounds like there is a potential for lot of competing programming paradigms here. GigaSpaces has the Processing Unit and you talked about partitioning. What are the commonalities between these paradigms and are there opportunities here for standardization?

Well, there are but I think concepts wise the same concept are there in most of the products like there is a notion of a partition or a processing unit and that one, that partition holds a subset of the data exclusively and then requests that deal for that subset of the data are routed to to wherever that processing unit or partition is currently hosted on a set of JVM that are hosting the applications and the middleware whether it is ObjectGrid or competitors, it is responsible for placing that partition now on a given box and then routing the stuff to it and then we add more boxes, then we scale it out by adding more partitions or by redistributing existing partitions so there is fewer per box that gives you more processing power.

As far as what happens within the partition, besides a couple of events like for example you are active, I mean the partition is not primary and the partition is not the primary, I don't think you need that many more new API's because we already have API's for accessing data like JPA and things like JDBC. We already have ways of doing messaging with JMS and besides some API's around events and things like that. I think we already have API's that can address the needs of an application running within a partition, with the existing stuff now, not all the vendors will have a standard API's for getting at the data.

I mean Coherence for example would be one and ourselves with the ObjectGrid, but like the API's are very simple and they are map based API's. we have more powerful capabilities it can bring to us. Arguably, the API's are designed around, the parallel nature of it sometimes you want to do things in parallel across the grid and the normal standards don't really work for that. We have proprietary API's that will let you do that as do all the vendors and sometimes you might be able to get away with them like using something like OpenJPA as your data access thing if you weren't using write behind and then use the local data in the partition as the second level cache for JPA.

So you would access the data and change it through JPA but the data is basically managed by Object Grid and then you write it through to the database if that is appropriate. So like there are, I don't think there is a radical shift needed for how you write the API for the applications. The big kind of radical technologies are around, how the application is hosted, how the requests are routed and managing the data that the application is going to use in a particular partition or Processing Unit.


13. So moving further up the stack now, you mentioned application virtualization, tell us more about that?

button can't meet the half second response time and I have maxed out my two JVMs that were running application A, the application virtualization might decide then to say let's stop one instance of B and start up an A there.

So now we have got more resource for A which means the checkout button might meet its half second requirement and that is what XT does and conventional hypervisors and hypervisor managers are unable to do that because they don't have the information necessary you know at that level to be able to do it. So like for application, it is that kind of a thing that we are doing it's almost like a fractal, because the first level in the fractal provisioning is that we are doing flow control on individual requests and then that gets you so far and then we send an event to the placement guy and the placement guy is the next level in your fractal provisioning architecture.

It will start and stop JVMs to reallocate the ratio of what JVMs are running what applications and things like that and then eventually that will get you so far and you will max out the boxes and then it's time for the next guy and that is where the hypervisor stuff comes in because something like XD kind send up a cry for help to say for the set of virtual machines or physical machines that I was given, I can do no more, it's maxed out and I still can't meet my SLA's and then the next level can decide I have got an XD system here and my Lotus Notes system here and then the hypervisor might decide to stop one of the Lotus Notes on hypervisors or virtual machines and give that to the XD guy and now XD has more processors they can play with, so it has made its pool of machines larger by stealing from a pool that was deemed less important.

So this is the next level of virtualization and then even after that there is another level because eventually the thing that is managing the hypervisors will run out of virtual machines that it can steal itself and now what you need is to a send a note to someone in purchasing, you know to send a note to who ever is your hardware vendor IBM to say I need you to send me another blade chassis and blade chassis comes and then you have to install hypervisor on it, add it to your hypervisor pool and suddenly the hypervisor has an extra 14 boxes that they can put virtual machines on and start giving those to the notes pool, to the database pool, to the XD pool.

So it is really the same thing at different levels, with different triggers for what causes you to go up to the next level, the next layer in the onion as far as representation goes. Then you start off in the beginning with time base stuff, flow control of the requests, then you can pickup the starting and stopping the instances of JVMs or applications, then you grow up to moving virtual machines around between a set of machines that are hosting hypervisors and then you grow up to physically ordering machines to install the hypervisors in the machine, to add them to the list of your computers and add virtual machines to various farms that are running on that. So it is like a tiered onion fractal kind of an architecture but it is clear that if you are at the hypervisor level only 1 of those tiers is really visible to you within the virtual machine everything is kind of opaque and invisible and that is where products like XD kind of excel because they have visibility into that world and can make decisions at the request level and starting and stopping at the JVM level and then can send events to start and stop the virtual machines and physical machines as need be.


14. So you mentioned ObjectGrid and WebSphere XD a bit in your previous answer, can you tell us more about what they do and how they fit in virtualization story.

ObjectGrid fits into the network attach cache type scenarios that we talked about, and the XTP type scenarios that we talked about, it's a platform and what makes it unique here is that it is an embeddable platform. Like some products they have their own server run-time, own JVMs, have their own consoles and you have to make a new investment to deploy that because it is not something you already had whereas we have designed ObjectGrid to be embeddable so if you had WebSphere ND, you can ObjectGrid inside an ND cell, leverage your existing ND skills to deploy applications for object grid to start and stop servers to manage them, all that kind of stuff. If you are running a competitive application server it works the same way.

So we don't force you to adopt a new infrastructure, you know as far as provisioning application, starting application and stopping, we work with what you got and just play J2SE if that's what you want. So on the XD side, so obviously data grid which contains object grids unfortunate naming, it's just the way it worked, data grid equals object grid is the way to look at it but the other side of XD is our operations management side and like what's that's for is it manages application servers, web server, web application servers which could be PHP, it would be J2EE like WebSphere ND it could be competitive ones like JBoss, Geronimo or Tomcat.

It does all the things that we have talked about elsewhere in the talk like doing the request level, measuring and flow controlling and reordering the requests, starting & stopping JVM's whether we are hosting WebLogic server or WebSphere server or Tomcat server or something else. But then it also has a bunch of technologies in it around health management, so for example if we have agents that measure things like heap utilization and things like that that are reported periodically.

And XD can watch for example the java heap is filling slowly and we notice over the past 10 minutes the heap has filled over 85% and then customer can define a trigger at 85% because they reckon it's going to run out of memory, so rather than wait for out of memory exceptions to happen, let's put the server into what we call maintenance mode which is to stop routing requests there and then may be we run a customer provided script to dump the heap and then dump some diagnostic information from web application and then just leave it there may be so the developer can look at us and later stop us or the customer could have an event, so that when we notice it's running out of memory, we can just cycle it, like we don't care, it's not about diagnosing we just want to keep it running may be we know there is a memory leak in it and we know we can't fix it right now so we will use XD and once the heap gets to 85% we will just stop the server and start it again.

So like XD has a pretty sophisticated mechanism for defining what we call predicates where predicates are like, if load is more than this and the response time goes over this, the heap is over this and it has been running for this and it has processed a million requests, then do this event or do this action and the actions are completely customizable you know so it could be send an e-mail, it could be cycle the server, it could be run this customer provided script to dump the state to the disk, put it into maintenance so we don't route requests to it, it could be anything you want so we got this kind of very open way of defining a predicate in that they can pick up the PMI metrics, they can pick up the metrics that we can refer to the environment and such things like the JVM and then you plug in user defined actions and we have a stock set of actions like cycle the server put it in maintenance mode, send an e-mail they can define anything they want, so XD kind of has 3 hats one is the Ops optimization side of it which is basically for application virtualization, ObjectGrid aka Data Grid is where we do our data virtualization, network attach cache & XTP type of stuff and then we have batch computing and high performance computing with a set of containers to do that as well.

The thing that's cool about XD is that they can manage all the environments in the same set of boxes and with SLAS's trade-off how many boxes run a particular type of work load so customer can run their batch and their online work loads on the same set of boxes and have rules for example that says that online is less important after 9 pm and they may give more resources to batch and then at 6 am in the morning again they may say no online is important again and then we move the load back to online so we can do things like that.


15. So tell us more about what is WebSphere XD and how does it, how can you mange different application stacks?

It's a big matter, I mean very few customers as much as we would love them to only have WebSphere in their shop. So they have as companies get acquired, they get build up through mergers different companies pre-merger had different technology bases so we frequently see customers who had WebLogic, JBoss and WebSphere in the same account like if you are going to approach data center folks as far as simplifying their lives it's got to address heterogeneity that's inherently going to be there and then simplify it and virtualize it so that you can lower your TCO and manage it in the cheaper way. The days are gone, I mean even something like ObjectGrid we could have done the usual thing which is to make it a part of WebSphere 7 or something right but then only the customers on WebSphere 7 would have been able to use this & it doesn't make any sense because we have tons of customers on Websphere 6.0 & 6.1 & 5.1, that would love to have that kind of technology and be able to leverage it in their apps and there's nothing kind of upsets a customer more that's in dire straits because it has a performance issue & to tell them “oh yea we are fixing that in the next version and you have to upgrade to get there”, so like we are starting to take the view I think that like at least for some types of middleware we have to try our best to make it pervasive and that means it has to work with what you have already got so things like ObjectGrid like I said can work on 5.1, 6.0, 6.1 and work on competitive application services as well.


16. You mentioned a set of actions that WebSphere XD can do to manage different stacks, starting, stopping I mean is that the level of granularity you can do for management or is there more?

No there is more, it is kind of open ended I mean we have a set of actions that come pre defined with the product you know cycle the server, stop the server, put the server in maintenance mode, by server I mean JVM so like maintenance mode means stop routing work to us, leave it alone, let it work on itself and die off and then go stare at it later on some developer who kind of know what to do with this, but they can also have actions like just a script which can do anything so for example you might have, within that JVM an MBean declared and the script might invoke the MBean to dump out some diagnostic information like why did the heap go up like that or why did the response time degrade and then maybe the application developer would have some diagnostics built into the application.

So he might want to dump that before you cycle the server so he can see what is going on so the scripts or the actions that can be triggered could be things like script 1 is a heap dump so we can see what was in the heap and script 2 calls the developer supplied MBean that dumps the state out and script 3 does a recycle so like then it would do automatically and then it will be in a directory for the administrator to get to the developer to say yea, your application went berserk the response time degraded by a factor of 2 and here is the diagnostic you wanted us to give you when this happened and then they can go look at it and see what is wrong and fix this or whatever and then they cycle this so like umm, then it starts up again and may be the problem is like Heisenberg so like they work ok for the next well and then they recycle it again when the problem reoccurs so there are lots of problem we find are Heisenberg and like the best way to fix the Heisenberg is to restart it but restarting it blindly means the developer gets frustrated because there is no evidence, no information so why it happens?

So having this way to plug-in scripts that can call customer defined diagnostic routines is useful because that information can be priceless for trying to diagnose Heisenberg when they happen because they do happen so infrequently, I tried it on my machine and it works fine and I don't know what the problem is I mean but it might happen only on the second Tuesday of the month may be because a backup is scheduled and chewed up the network bandwidth on that day and that is why the response time degraded. We have seen everything in IBM like when we get called, we had a customer recently that had something like that had a back up started and we would go there and look at it and everything is working fine, it is fast and we leave and we get a call sure enough the following Tuesday because the backup started again and chewed up the network again so like know you had to be there on a Tuesday to actually see why is my ping time gone up to like 600 milliseconds from microseconds and it is because that the network was congested with the backup so we have seen all kinds of weird situations, from the looks of things it would be completely random but usually there is nearly always some kind of an event associated with him but it is tough and you might have a standard diagnostic script that you run that would for example check the network ping time to database, try to open a connection to database, just at that point in time if you are actually there when it slows down, what would have the developer done, so you can package that up into a script and then whenever XD through its monitoring notices that something weird is happening like it is missing its SLA consistently for few minutes, the heaps gotten big, the response time has gone up, then we can run that thing at that point of time on that box and collect that information so hopefully you can figure out why that was the case because without that kind of automatic diagnostics if you want, when it happens it is very hard to diagnose problems in a big system.


17. What about provisioning? Can XD help just if you connect more services to your cluster, can it auto provision them whole stack of everything?

Not just, within IBM what XD does is that it virtualizes XD cells, or ND cells or your own application server so if you give a zip file of JBosses or Tomcats, we can unzip them on your box and then we know which scripts to run to start and stop them up and stuff like that if you want to use with ND when you make a new box and you add it to the cell or to the node group that XD is using and then XD would notice it and take advantage of it, if it was ObjectGrid you start up your ObjectGrid application on that server, ObjectGrid would immediately notice it and starts to use memory, CPU and network to make grid larger more scalable automatically so like how did the machine get there is kind of other question you were asking how do I get the operating system and IBM has products like Tivoli provisioning manager whose job is to take raw boxes that are delivered from the factory and plug into a rack, then flash the bios on them automatically they will have a database that knows that box is in this pool and it should have this version of Linux on it and have this version of WebSphere on it and then they have to run a script to add that WebSphere into the XD cell and then lone and behold it appears you know the XD cells, it handles provisioning the whole OS out and in the same way, the virtual machine has same issue like if I plug a new server in that has a 8 cores on it that means say four 2-way virtual machines, those virtual machines have to be stopped on the machine where they are currently running and then restart it with the new one to spread things out and then provision them to add to the XD cell and then they become visible and they become resource that XD can consume.


18. So finally, Billy what is your favorite computer book?

Me, probably the Structure and Interpretation of Computer Programs from MIT press by Abelson and Sussman, when I was in college it covered all the main programmatic styles like imperative logic functional and ways to do things all of those, it is all written in scheme but I think that the algorithms and the thinking models you get from reading that book is awesome, that is what I say that is my favorite programming book of all time. All right, thank you.

Jul 14, 2008