I guess the first thing I would say is that having been to EclipseCon for four or five years now it’s quite interesting the rate of growth of adoption that we are seeing. The first year there was a sort of a talk on OSGi and few people turned up, it was like “what is this strange thing over in the corner?” and this year many sessions have been OSGi centric and packed audiences, and the level of questions people are asking shows that adoption is moving. So, that is very interesting. One of the things the Alliance has been interested in over the last two years is and we started conversations in OSGi in 2010 was the relationship between OSGi, modularity and cloud. And there is a fundamental relationship between those two concepts, Cloud 1.0, if you want to call it that, is well known by everybody these days, made famous by Amazon EC2 and consisted of the idea of take your software as it existed, put it into some virtual machine and ship it to some other place. It’s a bit like outsourcing, a new name for outsourcing. And the advantage of course was that the software didn’t change, you just packed it into this deployable unit which was the virtual machine and threw it offshore. Now, that attracted interest because of the low impedance, it’s easy to use.
The problem was that it didn’t really address the fundamental problems that were going on in large enterprise environments and it’s not so much as staging up the machines or the resource utilization, is due to the maintainability and maintenance of the applications; and that feeds much more into how do I make maintainable applications, or maintain the applications modular so if I need to modularize to drive down maintenance cost I want to do that but I also want cloud, what is the convergence? So for the last two years we have been looking how the present OSGi specifications feed into what we might want to do in the cloud area, so the Remote Services, the Remote Service Administration specifications, the new R5 resolver is quite interesting because we can inject a component into a runtime environment so it can pull in environment specific bundles, I’m on Amazon so I am going to use this specific bundle or this sort of bundle, this sort of things. And also, we’ve got some new stuff in the pipeline, the cloud ecosystem RFC that David Bosscheart from Red Hat presented on earlier today and a new specification on distributed eventing is to make its way through the Alliance, again because we want a number of programming models for developers. We have got a very strong service model which is great for some sorts of application types, we want an equally strong asynchronous eventing model and all this feeds into the general cloud application modular story that we are trying to develop.
I think that is very much the case. Your environment starts with the minimum shim possible, so your virtual machine is basically being used to resource partition your physical resources, you want to slice and dice that and say “ok, small, medium and large VMs, I’m there, I’ve got an agent or I’ve got a framework running on that node” and the framework saying “I’m here, perhaps within this particular grouping”, ecosystem as David refers to it or zone, “small VM, I am available for use”. And then the general idea is you inject in the description, the root, so you are an analytic server and the resolver then says “this is, I need these bundles, pull them in, wire, creat the service and so it’s all assembled dynamically, on demand as required”. And the advantage of that is you are not shipping a large virtual machine images across the network, you pull in small bundles.
3. And does that help in terms of updating the application and keeping it fresh?
Very much so, because then you just bump the description and you only necessarily pull one bundle, change one relative to the structure you have, so it means that in a large cloud environment your traffic loading in that environment could drop dramatically which is cost saving on the infrastructure side. But more importantly it comes back to this issue of maintainability, systems are complex because they need to be complex, they are doing complex things, especially in the financial services industry, heavy industry, big data and these sorts of things, but you still want some way of dealing with that and modularity gives you a mechanism to deal with that, make it very manageable and update incrementally. So, what the Alliance is doing is in some ways far more incremental than big data or cloud, to be honest, it’s just that you have to plan ahead and you have to care about how your system is going to work in two years or five years’ time. Obviously, there is always pressure to do things now and quick.
Alex: And you were talking about doing this quickly, I guess there is an increase of agility that you get from these kinds of systems or indeed for continuous deployment where you are always bringing it up to speed.
Very much so. So, I’ve been working on a whitepaper for the moment that’s been looking at the relationship between structural modularity and Agile processes and the reason why I started doing this was because I find it difficult to understand how you can have a conversation about Agile without mentioning structural modularity. It’s a bit like saying you are applying Kanban to a pre-production wide car production where the engineers are shaving the wood, and handcrafting the coax for put the engine in, it doesn’t make sense. Agile processes, Scrum, Kanban, really only start to make sense when you have a production line type capability, continuous integration, both of those are basically based on modularity, unique components that you put together that you can cycle around pretty quickly from a Scrum perspective and work in progress, you need fine grained stuff. So, what is the modularity format we are using? It’s OSGi. So, I am a bit surprised that the two industries haven’t really come together very tightly and so this is an attempt to try and stimulate that conversation and we’ll be sending a copy to you guys and we’ll be pushing out every where else and saying we are both talking about the same thing, you need both elements for this, you need structural code and you need the processes. At the moment the industry has been focusing a lot on the processes and almost “structural modulation is too hard to bother”, the second one won’t work then.
Alex: So, being able to install new software and new revisions of modules is fine if you can take down your VM and bring it back up again, I guess OSGi adds to the dynamism of that process.
It does. The whole service layer in the OSGi stack is based on the premise of Deutsch’s fallacies and network computing, even though we are in the same JVM the idea is that services can come, services can go, I’ll dynamically find and rewire and then we extend that across the network with the RSA implementation so that services remotely, which are much more likely to fail or come back again because of network problems, OSGi understands that. In a way, I see the OSGi remoting model as a natural successor to some of the concepts that were in the old Java JINI world which very much brought to the scene the idea that things change, things break, let’s build Java systems that can cope with that; the OSGi RSA implementation takes all those ideas and does it in a much more elegant way.
4. How do they discover and talk to each other in distributed cloud environment?
The nice thing about OSGi and the way this specification has been written is the stack is actually modular, you can take an RSA implementation and it has three components, it has a discovery layer, it has its topology manager and it has a data distribution provider and the implementation is applicable at each layer in that stack. So, you will find that some implementations will do something like use a centralized ZooKeeper, nodes and consensus and register and consume from that sort of thing. Others will use SLP for discovery, in our products, Paremus, we use a distributed peer-to-peer messaging protocol called DDS, which is actually used in command and control systems for the US Air Force, but we can change them, which is a nice thing about OSGi, it gives you flexibility at the infrastructural layer. Another distribution provider layer is pluggable so you can do things like use an RMI implementation if you’re Java to Java or Avro if you are trying to move data serialization across the wire or these sorts of things, or even things like JSON-RPC, so you have an architecture but you still have the freedom to use the right tool for your particular application.
Alex: We’ve been seeing a few environments within EclipseCon this week of people having either dynamic website, webpages they were using websockets to communicate the state of live things or we’ve even seen the embedded hardware, things like MQTT being used to transport things back again. Do you think that with all these things it’s possible to have a dashboard that shows you where all of your clouds nodes are, their utilization, what has been happening and gives you the decision tools as to how to further resource clouds if they are CPU intensive.
Yes, so the Alliance is working on this area and David presented this afternoon where the specification process has got to, some sort of simple architectural model that they are playing with. There are solutions in the industry which have taken this sort of idea to its logical conclusion where you can actually dynamically deploy OSGi based applications, so assemble them on the nodes, but based on the loading of those nodes, co-location consideration. So, I need this functionality to be deployed to a node that has for example a market data cable connected to the back of it, because there is no point putting the pricer in the back office, these sorts of things. And also, putting this functionality on machines with sufficient memory, least loaded out of the set that I had, all these sorts of things. So, these sort of advanced provisions behaviors which distributed they do exist in products that are out there, but they tend to be next generation cloud product rather than virtual machine shipping out that we’ve seen previously.
Alex: I guess another thing you could do is monitor not just the load of the system from a hard and fast CPU utilization, but also start getting statistics about how long services take to be invoked and monitor a real time health of the system as well.
Yes. So, for example, one of the things we do in our world, the Paremus world, we have this idea of replication handlers and you can feed into those business logic, so for example you can look at queue depths, you can look at historical trendings, triple witching which you know you need more resource, you can pre-scale based on that and basically high level metrics that are of interest to the business rather than just dumb loading on the machine. And I think that is where things are going to go in general, it’s just really good standardized architecture where it allows us to do that very sophisticated, truly distributed cloud type environment. So I think the Alliance is going in the right direction, it’s just a lot of threads we are pulling together.
Alex: I think that kind of environment would scale quite well both in terms of adding new nodes but also in response to external changes or external impacts, like either the chaos monkey that Netflix has, or if there is any sort of weather related the disasters that happen or cause events.
There’s Taleb, who did “Fooled by Randomness” and “Black Swan”, has published a book recently called “Antifragile: and he has quite interesting writing style and he’s an ex New York trader he says it as it is but I think it’s a very important message that comes out through “Antifragile” and he basically argues that systems are one of three types: the fragile, any change in the environment causes damage to them, they break; the robust, change in the environment they can survive; or the antifragile changes in the environment can actually allow them to reconfigure and possibly improve, and the way we assemble OSGi systems based on environmental inputs coming into them and dependency management, really this leads us towards this antifragile approach, as I roll new machines into the environment my application says “gee, that’s a better resource than the one I’m on, I am going to redeploy, I am going to assemble and use, that’s got an FPGA on there, so I am going to use the libraries that can use that instead” type behavior. And also something unforseen happens, we lose large amounts of data centre, it will still try to rewire itself and keep going. So I think we are looking towards systems which as part of the PAS layer they are trying to behave in this sort of way, they will function whatever environment applies to them. Whereas for the last decade operational data centers have been “don’t touch it, it’s working, leave it alone”, time doesn’t run, time has stopped and that builds very fragile systems and very fragile businesses. So, we’ve always been the other way, it’s going to break, let’s break it, let’s make sure it recovers, now we are going to break it again.
There are always ways of doing it, I can only talk about the way that we do it. I don’t know if you went to Neil Bartlett’s talk earlier this week but what we were demonstrating there, was a demonstration where we took Nginx, which actually I think it’s written in C, Nginx, but we wrapped it so it looked like on OSGi bundle. So, what we are doing is we are saying “OSGi has such a strong lifecycle management configuration and metadata sets of standard, let’s use them generally”. So, we had this bundle and when we deployed into the environment basically installed Nginx, started it, and we sat there waiting and as the endpoints came up, the rest endpoints the web it was communicated over RSA, there’s a new endpoint there, and the Nginx config was rewritten and the signal started to restart so Nginx dynamically configured itself in response to the cloud environment expanding or contracting.
Alex: And they were being done essentially on the fly by reconfiguration to the engine, one node goes down, you have to remove them, configure it, the node comes back up.
Yes, and you can put some sort of replication behavior which is monitoring the load on the Nginx thing and trigger expansion or contraction of the backend web services and that sort of feedback loop as well. But it’s this sort of dynamism that I think it’s part of parcel along with modularity where cloud is going to go, the next stage of it, the grand vision.
That’s a really good question. Of course if we wrap a component, any artifact, as an OSGi bundle we have a set of metadata there which describes requirements and capabilities. And so, the logical next step for us, and the thing we are looking at is to actually pull in those libraries and capabilities using OSGi requirements capabilities metadata.
Alex: And then you will be able to use the resolver to be able to materialize all of this, set it up in a runtime standalone environment and just let it run with it.
Yes. And the important thing is it’s a coherent approach across your whole real estate. Another topic tomorrow I believe is Simon Kaegi, I think his name is, he is IBM, he is looking at JavaScript in OSGi, again how JavaScript can leverage the services layer in OSGi. And again it’s this idea of several things in the OSGi framework and the OSGi specification which are actually quite generic in terms of used to other things. Just on the side, we are looking at wrapping up Erlang and doing that sort of trick, Erlang components at the moment as well, we see the metadata and lifecycle and configuration specifications being much broader than Java.
I think the one we have just been talking about is the “let’s leverage the metadata and the lifecycle across the board”, because that is a really clean model, and we can do that for everything. We can leverage the services layer as appropriate for a subset of things. The approach of actually taking software code, breaking into units and wiring them together with a runtime container may be applicable for some types of languages, not for others. I know that a number of groups have done C/C++ implementations of OSGi specs where they do basically install code bundles and replicate what’s been done in Java and there are several groups working together to try and produce a standardized spec under the Alliance umbrella in that area now. For something like Python, for example, perhaps that wouldn’t make sense, but perhaps the service layer in the metadata do make sense, and some of the functional languages, Scala works really well with OSGi, Haskell well you probably, it’s a whole different kettle of fish, but the metadata and services layer might be interesting still. So I think there’s probably several layers of data integration that we can encourage under this umbrella rather than thinking about OSGi as a single layer solving it all.
You are asking me, I am never good with these dates, David who was here this morning would be a good guy to answer that. I think R6 is out January 2014, so I think we are working on that timeframe at the moment, a lot of the spec activity is building for that release on the enterprise side. In terms of the core specification, I honestly don’t know, it tends to be aligned, they tend to cluster as it goes through at the moment, but yes I need to get one of the guys in that area to talk to you about the spec release timetable.
I think there is a desire to definitely increase visibility, so recently we’ve started doing the public cuts of the RFC, after it’s gone through an internal evaluation process, this sort of makes sense we let copy out for public comments, so the RFC for cloud has just gotten through that and hopefully the distributed eventing will be there. I think partly at the moment, the reason why they are getting bundled together and get thrown out through the door at the same time it’s a resourcing thing, it’s easy to manage that release process, perhaps if we can get a little bit more resource into that we can actually start breaking them down a bit and treat them a little bit more incrementally, that’s just my own personal view, I think it would be a good idea because you get hit with “gee, 500 specs at once, comment on them” and you may only be interested in a few of them, and if we can feed them out at a relevant point, probably pull in the relevant parties at that time in the way that you are suggesting.
There are several things you can do, we have a community wiki site which is up and it’s slowly creating some really good content. So, recently there was an effort to put on information about various build tool release chains, so I think we have four or five on there so far, a couple relating to BNDTools, one relating to PDE I think, with Maven, with Ant, these sorts of things. Whatever you want to use here is what to do, here is how we make it easy, so obviously that is going to be an ongoing thing and there is also sets of examples and best practices that we are putting there as we go, that’s a good place to start. There is the OSGi Alliance site which has information on members and that sort of thing, obviously if you are using OSGi we would love if you joined up as a supporter, that is zero cost, if you want to get involved with the specifications there is a joining fee for the Alliance but that’s we changed the structure over the last couple of years, so it’s a fairly minimum pricing these days, you can you can be a contributing associate for, I think, it’s $5000/year and that gets you access to the specification process as it’s going through, you can get involved with EEG and the core teams and that sort of things. Finally, there is a user group; the user groups we are finding they are expanding at the moment quite rapidly, I know the UK user group which is actually chaired by Mike Francis, who works at Paremus, it has over 200 people now, we need to hold another one in due course, it’s been a while, but we have a new group in Washington DC area, there is one who is attempting to form in Wall Street, we are talking to the Brazil JUG [Java User Group] at the moment who may be interested in a user group down there, so we are seeing a lot of activity there as well. That’s a great way of meeting local people who are using OSGi and exchanging notes and experiences and we try to get people on from the Alliance to talk and that sort of thing as and when we are able, that’s a good thing as well, getting involved from that point of view.
Alex: It seems that OSGi is almost everywhere in the application server runtime environments, and one the greatest secrets of how distributed systems work.
I think it’s been around for a long time, it’s been a decade plus, 1998, it started, and we’ve seen lots of fashions and new technologies come along and this is a new great trendy thing, and in the background the OSGi is slowly growing and as you said, all of the application servers I think now in the enterprise space area are OSGi based, we’ve got all of the main middleware companies in the Alliance and adoption is growing and we’re seeing acceleration. I think it’s fundamental, I think that at the end of the day most large organizations need to get a hold of the maintainability thing and modularity for your Java shop is the way to do that and OSGi is the standard for all that and people are starting to move. We’ve been training on and off all over the place over the last year, Paremus, we did a session down in Sydney, first time in the Southern hemisphere on the other side of the world, that sort of intense trainings over the last two years and we are seeing significant wrap up.
Alex: Richard Nicholson, thank you very much.
Thank you, Alex, a pleasure.