Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews Derek Collison on Apcera Continuum

Derek Collison on Apcera Continuum


1. Hello, this is Chris Swan, Cloud editor at InfoQ, I’m sat here with Derek Collison from Apcera. So, Derek, Apcera released a Continuum product towards the end of last year, can you tell us about that, what it’s doing and what you are doing with that in the marketplace?

Sure, absolutely, and thanks for having me. Continuum was a progression of thoughts that I had. Prior to starting Apcera and developing Continuum I was involved in design and architecture for a system called Cloud Foundry, which was a platform as a service for enterprise, whereas before that Heroku for Ruby on Rails and Google app engine for Python, it kind of existed. What was interesting about Cloud Foundry and the success it’s had was that it was trying to speed up the deployment of applications, application development itself had gotten easier, but the deployment piece was kind of hard. What became very clear after Cloud Foundry was launched and being successful was that’s still a very small piece of taking a business requirement and driving business value in this caring cost and this pipeline of how you actually get something out the door. And so Continuum was set about from the ground up a total re-architecture, re-imagination if you will of what an enterprise IT platform should look like, at the very highest level was something that combined the agility that was starting to come to fruition with platform as a service, but also understood the notion that enterprises and groups needed compliance and security and governance in the same platform.

When we put together the company and we actually started to go down the path of architecting and designing a system, there were essentially four tenets that we were trying to drive everything around, from the business model of the company all the way back to the technology. And those four tenets are simply number one, it has to be a single platform that understands delivery models that span from IaaS to PaaS, meaning the abstraction level meets what you are actually trying to do with the system, maybe you are still trying to spin up an OS and operate with it, you should be able to do that, but if you want to do greenfield style applications and say I had my app and I understand that it talks to B and C and services D, E and F, describe that and tell the system to run that. Additionally on that first tenet, we were talking about the notion internally of software as a service, don’t think per se, but think of a majority of IT headcount is internally building things that look like apps they are actually turning into services and what’s happening in a lot of organizations is that the process of taking the singleton app and then making it scalable, highly available, fault tolerant, understand identity, secure, addressable, discoverable, was being duplicated over and over again in organizations.

So, the last delivery model in terms of that first tenet was how do we make an easy button for that? You interact with the system as a platform as a service, use the language frameworks that make sense to you, Java, Scala, Play, things like that and get the functionality in feature set right for your core service, so to speak, within the organization and then press a button and say do all the other heavy lifting. I call it undifferentiated heavy lifting and I heard Adrian reuse that in his talk earlier today, take that away. So, all of those things would be available in a single platform, that’s tenet number one. Tenet number two was that policy had to be a core DNA component of this system, doesn’t mean that policy is in your face from day one, that slows and frustrates developers and DevOps people down, but it couldn’t be a built-on, it had to be core to the system and it had to be easily extendable, programmer composable from the inside out, but ever-present, so to speak, in the system. The third tenet was an interesting one that had us really talking through with the founding team about what this actually meant and that is that our system is actually semantically aware of communications and their patterns within our system.

What drove that decision was that my app, your app, service A, service B, service C in amongst themselves are not very appealing or exciting anymore, it’s how they all connect and form an ontology together, moreover I actually said that it’s the patterns of which they are communicating as they are running that’s actually the most powerful thing to gain insight, gain more information to enact decisions and also be able to enact control and behavioral changes and policy, we call it transparent policy injection. This was not a simple problem to even state or try to tackle, but we believe it’s one of the sleeper functions of feature sets that we delivered that’s very, very powerful, it can do some incredibly powerful things because the way we’ve introduced it into the system, it’s not a client library, it’s transparent, you get it for free, so your application is talking to a database, MySQL, Postgres, whatever that is, simple example, but it illustrates the point. Our system actually knows how to connect you from a physical access layer from routing to connectivity addressing discovery to the database and we present a framework that is actually semantically aware that there are things like inserts and drop events and delete events that go between you and everyone else to the database.

Moreover, our system allows you to independently hook into that framework and both get information out and then inject policy and control and behavior in, without ever touching the app, the app actually even keeps running and the databases never even stop and there is no Apcera code at all within any of these individual components. So, you can think of things like “hey, can we actually run a real time audit of any personal identifiable information that might be escaping our system?” and it’s very hard if you are going to do that for an application where you are changing the application. A lot of people that we’ve been talking to, that are interested in what we do and our technology, they don’t have one app, they don’t have ten apps, they have thousands and thousands of apps in six different languages and ten different frameworks and app servers in 30 groups and corning things like that in these organizations can go 18-24 months. So, that third tenet around semantic awareness of those communication patterns and the fact that we do a network perimeter security model of one, everything in our system, this is the fourth tenet, is just a job. A job can be an OS, it can be your greenfield app, we don’t care, the core pinnings of Continuum just understand a job.

We wrap every single job in an isolation context, those isolation contexts for us are very, very specific, we don’t say it’s a hypervisor, we don’t say it’s a Linux container like a Docker, we don’t say it’s micro task virtualization like Bromium, but for us it’s very, very specifically isolated, insulated and autonomous. Autonomous means that once your app has policy wrapped around it, it says it can talk to the database. From the physical access layer all the way up to that semantic awareness piece that we talked about and everything in between is controlled no matter where you’re deployed in a Continuum cluster. The other thing is we don’t have the notion inside of our system of “oh, this is something special”, everything in our system is just a job, they are all wrapped in iso-context, all have policy attached to them, they are all autonomous, they all have policy controlling ingress and egress rules, and so if an organization is struggling its scale with the notion of I’ve got an app and then we have NetOps who are forming I call them barnyards, and your app comes up and says “well, I actually want to talk to service B and D” and NetOps goes “yes, we don’t have one of those, we have one that talks to B and A and one talks to D and E”, there is that thrashing its scale between that models, well, in Continuum you get it for free and there is no manually trying to stitch together the network, just happens on the fly, in real time, at scale. So, those four tenets, the multiple delivery models, policy as a first class citizen, semantic awareness and the power of the network and that perimeter of one and then the iso-context, formed essentially the basis of what exists today in terms of Continuum.


2. People will be familiar with Cloud Foundry coming to market as an open source approach and some of the commercial offerings that are starting to be put together around that and then the recent news about a foundation for Cloud Foundry. Are you doing open source or are you pursuing a different model with Apcera?

We’re doing a mixed model. So, certain level IP that we have is closed source right now, customers that we’ve engaged with however have access to that code, so it’s not that they can’t actually get to it and actually contribute back, what happened to Cloud Foundry, from a just pure open source stand point. But what’s interesting about some of the decisions I’ve made and potentially others made, which I thought in retrospect were not probably the best, was I really thought ecosystem engagement and expansion would be driven by open source and what happened in reality is that if you and I are trying to get from point A to point B, most people who are smart enough will figure out a way to get where they want to go, but they will take very different paths, so within the Cloud Foundry core at least the original one who might be different now especially with the Pivotal engineering team who been working on it, but at the beginning you might take a very different path throughout the core base and changing it to make it work for what you are trying to do than at the time VMware might do.

And so, what I started to look at and stepped back and observed and tried to learn from was this fact that there was a massive fracturing and implicit forking that was happening within, by all accounts a very successful open source project, so with Continuum, pieces that add tremendous amount of value into the system are open source, but the difference is that they are being plugged into extremely well defined programmable interfaces from the inside out of our system, the total core of our system is programmer composable from day one. So, even if you wanted to see the lowest level, nasty code core piece of our distributed scheduling plus placement in runtime execution engine that we put together, to model and change those and actually add, let’s say, your company’s investments and IPs into this platform, you’re not touching all of that nasty stuff, there are well designed interfaces that are driven by policy throughout the whole system that gives you, at least from what we can tell, the ability to not run out of runway and also diverge from where Apcera, the parent company right now of Continuum, is trying to push it, if that makes sense.

Chris: Sure does. You’ve talked already about how Continuum’s polyglot in terms of the languages that you can implement on it, but I know that you made quite an early commitment to go with Go in terms of your development of that platform and it’s kind of curious that that move was ahead of Cloud Foundry itself, with its recent move toward a lot of translation of core components into Go. Tell us a bit about what made you take that early decision and your experience with Go along the way.

Absolutely. So, most people from Cloud Foundry realized that I picked Ruby to do a lot of the development with the original system, I actually gave a talk in Japan around Apcera selection of Go and in part of that presentation I talked about, I still love Ruby, I think it’s a great language, I have a great relationship with Matt, but what happened with Cloud Foundry was that the development piece and the ability for the founding team to go very, very quickly, quickly got masked by the fact that deploying a large scale system with lots of dependencies, even the original tooling was written in Ruby, so it had dependencies that were then being impacted against our users, became very painful. When you design and build distributed systems, a lot of people were “oh, Ruby was too slow”, most distributed systems aren’t bottlenecked by an actual language implementation, you can make things run fast enough and by all accounts Cloud Foundry is extremely fast in terms of scaling things up and moving things around. But the pain of trying to keep that system up to date and life cycle manage the systems itself in production with the dependencies around Ruby and the client libraries that we were linking in like event machine at the time was pretty painful. I’ve been watching and learning Go, we couldn’t use Java, we want to be polyglot in terms of the actual workloads and embrace both the Java and the Linux and some of the other runtimes as well as Microsoft and .NET pieces, and we couldn’t use .NET for the exact same reason and so there were two candidates we were looking at.

One was Node.js, which for smaller pieces of code I think works very, very well, when they get larger you have to be very prescriptive to avoid the callback spaghetti, Go was very fascinating to me because it was a garbage collected language, but it generated static executables and it had real stacks. So, I was introduced to Java when it was called Oak and one of the interesting things about Oak was that everything was on the heap and so if you look at Java it’s an amazing technology, the garbage collector is inside of it. But it’s amazing there is so much PhD type of IQ in there because there has to be, there is no other option, everything is on the heap, so we have the perm gen and long gen and eden and stuff that I don’t even understand anymore. Go and the team that brought that language to bear, which is out of Google, Rob Pike and Ken Thompson and some others, originally started driving the push, was to solve very specific problems inside of Google and garbage collection is really nice, there are certain times when you literally cannot track all the references by hand, but at the same time they just did one simple thing, they made real stacks.

So instead of for all the Java programmers out there they were trying to influence the garbage collector algorithms through flags and they are changing their code to generate less garbage, just stick it on a stack inside of Go and my very first Go program was to test that the stacks were real and they never touched the heap and they never touched the garbage collector. So that and the static executables meaning deployments, you can SCP the executable to the target machine, was very, very compelling for us. The fact that some of these pieces, even though we have a lot of pieces in Continuum, were going to be heavy and gnarly, the team, I strongly suggested Go, but it was a team decision, they selected it very fast and the ability to have people that have never touched Go, but learn it very quickly is very, very compelling as well, you don’t need to have played with Go since 0.5 or 0.6 or whatever when I started, it doesn’t matter, you can pick it up in about two weeks and be effective and then in about five or six on a daily basis if you are using it you are very proficient in it. Well designed, simple, I’m a big fan as you can tell.

Chris: Sounds like it. So, with your Oak reference there, it touches upon some of your earlier career, tell us a little bit about the journey that took you to VMware and Cloud Foundry and some of the key things that you’ve learnt along the way.

Sure. So, prior to Apcera I was at VMware and I was working on creating and designing Cloud Foundry at the time, prior to that I was at Google, so I was at Google for five years and along with Mark Lucovsky had created a group called the AJAX APIs, the general gist of it was bring developers to the Google platforms without having to have server components, you just needed a web browser. And as we were going through that process we not only learnt quite a bit, but at the same time Paul Maritz had been picked to take over as CEO for VMware and he came calling to Mark, and in relationship myself, about really driving VMware past just virtualization and he had a lot of ideas there and one of them was to bring Mark in and myself and another person in the Cloud Foundry original team, Vadim Spivak, over to just think of some big ideas. And so my idea, I’ve been watching Heroku, I was originally a Heroku guy who had pivoted into “hey, we need to make deployment of Ruby on Rails apps easier” and said probably the enterprise probably needs something and we talked earlier in the talk why that was perfect for then and then at least my views changed that it was only a small piece of a larger pie.

Also, within Google, Vadim and I did Gmail photos, so the fact that you can see your picture in Gmail was something that Vadim and I did when we first got to Google. And that, I actually came up with because I said NeXTstep had it for years, the Steve Jobs second company he had started. And then prior to that I spent what is multiple lifetimes, at least in Silicon Valley years, at a company called TIBCO, when I joined it was originally called Technicron and the most part within there was that I helped design and architect and build the high speed messaging systems that at least in the late ‘90s and stuff, actually, all throughout the ‘90s I think, those messaging systems were prevalent along with all the Sun and Sparc boxes in Wall Street and financial services throughout the world.


3. On the subject of messaging we had Tim Bray here yesterday talking about how HTTP has become the all-conquering protocol, but we had the Internet of Things track here today and there was a lot of mention of MQTT. Did we as an industry drop the ball by not having a standards based approach to asynchronous messaging and have we subsequently fixed that up with AMQP?

I’ve been doing messaging systems for quite some time and all the distributed systems that I architect and build are based on a messaging substrate, publish/subscribe queuing based algorithms, especially distributed queuing where I would say subscription or interest based operations not a publish operation, I think are extremely powerful and I have spent quite a number of years building and designing what I call application enterprise messaging systems that try to boil the ocean, you want us to do distributed transactions, we can do that, you want to do guaranteed delivery, we can do that, exactly-once delivery, it’s really expensive but we can do that. And over the years I still believe in the power of publish/subscribe, you don’t know who cares about this message and you shouldn’t have to care, you know how to send a message and it will get to where it needs to go. And it’s not necessarily point to point, it can be one to one, but it should be able to be one to n, and I can also send the exact same message and have it delivered to a distributed queue where only one of a group of people actually wants to answer that.

Those two patterns along with some patterns that I’ve reused quite a bit over the last eight years, I have a question and I want to ask this question, I want one answer, but I have no clue how many people can answer that question, those patterns I believe in, I keep reusing those, as a matter of fact, that pattern by the way not necessarily went into the messaging context, but it is what Google does. So, when you do a Google search, the reason it’s so fast, there are a lot of reasons, they are above and beyond my ability to comprehend even though I was there for five years, but one of the big things is that we ask a whole bunch of people and we ask shards of information, this shard has this corner of the Internet and there are thousands of machines that can answer the exact same question and what Google does is the first one back wins because that’s the fastest, that’s the best user experience, but it doesn’t mean that the others won’t answer.

And so I designed a messaging system that underlies Cloud Foundry, it underlies the Badoo search engine, it underlies a couple of other companies initiatives as well as Continuum, but what it does that is radically different is it is a fire and forget, it’s almost like a nervous system, I call it a dial tone, it protects itself at all costs. So it’s not going to do any of the heavy lifting, it’s going to get you into any of that false sense of security saying “well, the messaging system is doing distributed transactions and persistence and durability”, it’s literally at the application layer, you need to understand is this date transition idempotent or do I have to do a compensating transaction model because I could send this message and no one could get it. So if I really care if someone gets it, I have to figure those things out. And over the years I have actually went from simple models to very complex models and feature sets of messaging systems and I’ve gone right back down to it should just do fire and forget.

Now, in terms of MQTT and AMQP, in terms of standardization of a protocol, I don’t know, all of the systems I do use messaging and HTTP, but HTTP doesn’t have the notion of how do I send it to n people or how do I send it to a group where I just won’t want to process it. So I still think, I don’t know if it’s a failure or not, I think people will use it, fin services still use a lot of messaging, we had a interesting meeting earlier today talking about that, JMS was an attempt at doing standardization around API, AMQP was a standardization attempt around protocol and binary payloads itself. I don’t know, I use what I like and how I think it works, but it’s radically different, I think, even what you see in terms of RabbitMQs and some of the MQTT in the news in terms of the light version and things coming out. Messaging is something I have always used, it doesn’t mean that you have to as a customer of Apcera and Continuum ever care, by the way you will never care that we write everything in Go as well, and so it’s a tool people can use and I think we use it quite a bit.


4. You went to some depth about policy and you also touched upon some of the security considerations, are you going as far as things as fine grained entitlements where you are providing services to the applications? And what do you see as the path for people to pull entitlements models out of being burned into their applications and being part of the platform the application runs on?

That’s a great question and the policy piece is very fine grained, early we talked about the fact that one of the early pushes and it still is a massive push for us is that any value we add it should first be transparent, meaning you don’t have to put any type of code changes into your applications especially Apcera code, that’s just a fail, in my opinion. That being said, certain entitlements policy pieces, especially where applications are operating on behalf of someone else, so you deploy the application, there is a set of policies that said, within Continuum at least, Chris is deploying this application, we understand the application is a job, it’s a job realm, we understand he is deploying to a name space/prod and there are rules about package resolution, what version of an OS do we have to underlie Chris’s app for, where does he actually get to run and not get to run in a secured trusted hybrid setup, all of those pieces can be totally transparent to you, the application writer and they are essentially enacting upon you as the person who is deploying and managing and updating the application. However, we have someone else who is now using your application to access data and a lot of times when you see entitlements they are actually engrained within the application, at least from my perspective it’s properly looking at the on behalf metaphor.

To make the on behalf metaphor transparent is very, very hard, there are lots of different companies that are trying different pieces of it, what’s interesting about Continuum, at least as an opportunity to provide some value in this area is the fact that we are in control of everything yet we’ve not made you change any of your code, from scheduling to provisioning to quotas and quota enforcement to placement to affinity to network access both ingress and egress and packet flow and semantic understanding of the packet flow, we have the ability more so than any other point solution in my opinion to actually be able to do something about an on behalf type of a model so you could pull the entitlement piece out of the application itself, but today I still think it still belongs in there because as you send a request in to the application that then talks to five different services and I believe in, Adrian’s talk earlier today about micro-services, lots of small services coordinating and working together as long as all the operations management is handled for you, tracking that context of who you are as a user of said systems and moving all around that, feels like it still has to have some level of embedment in the application space, at least today. But if there is going to be a solution where it can be pulled out and made transparent, it has to be a technology that owns or has visibility to all layers of the stack, it can’t just be a point solution in my opinion.


5. I guess that also relates to token services, so are you going to be providing token services or will you be a substrate to run tokens services over the top of?

Currently, again, the way we actually provide our value at least with Continuum as it exists today everything is transparent. That being said there are a lot of services underneath that we are using ourselves. You can imagine or envision things like anomaly detection and self-healing and health and auto-scaling and things like that, that we can eventually surface and actually let people who are deploying workloads on top of Continuum to utilize as well. Underneath the covers, we’ve talked earlier about messaging systems and state transitions, well, within Continuum every single thing is actually digitally signed and encrypted within our system, it’s been that way since day one, we utilize elliptic curve, we don’t use the bad constants, but we do use elliptic curve cryptography, so we have the notion of both carbon based and non-carbon identities all throughout our system. That might become interesting to users of Continuum at some point, right now they can’t get to that, they get the effect of it, but they can’t actually directly consume it, but we have been talking quite a bit about what are the services that we are using ourselves internally that we might want to surface as they become more mature and we can support them, to our users, and one of those might be exactly what you talked about, the credentialing, the policy piece being directly embedded into the application if it makes sense, the token stuff. We do all our own key management, so we have a single key, a trust key that we build trust upon, we generate ephemeral keys for everything else in our system, that might be useful for someone else, right now they just can’t see it, they just get the benefit of it. But in the future we might actually expose those.


6. I can see how there might be interest in that. So, you talked about reaching down, do you see yourself going into the APIs of people’s software-defined data centers or the public APIs of cloud providers?

We have looked very hard at how we can make a system that’s programmer composable from the inside out, not only for us to be able to plug stuff in and our customers to plug stuff in, but partners to plug stuff in. So, we’re a startup out of San Francisco, right downtown, we can’t solve all these problems, what we can do is enable a partner ecosystem where things like SDN can be plugged into our system. So you can imagine a scenario where you say I want to run an A/B test, so everybody talking to this service over here I want to take 10% of the traffic and routed through an SDN and I want to do it transparently to said end service and all applications that I actually want to run the test on. When you look at what Continuum actually does as an enabling framework and a technology platform that you’re deploying these workloads onto, it can actually do that, right? So we are pulling a proverbial thread through all these foundational technologies we put in to say “yes, you can actually transparently, on the fly write a piece of code that says this is what I want to do for 10% of the traffic on any connect event to that database”.

That piece of code and behavior is written in any language you want, has no bearing on all of the applications that are trying to talk to the database and you can, what we call transparent policy injection this into the system and then have a test that runs, let’s say, for two hours and then you click a button and then instantaneously it’s torn down and you say “let’s look at the data”. By the way, we have all the data, we know how much CPU, memory, network, bandwidth you’re using, how that network bandwidth is split up between all the services you are talking to, we understand the patterns of what you are talking to, how many inserts per second, drops per second, post puts in terms of HTTP you’re doing and that gives us a tremendous amount of feedback from that A/B test to say “yes, this SDN provider is doing something that is amazingly powerful for what we are trying to do, we didn’t have to spend three months retooling our apps or whatever in this case” and we could use an enabling technology like Continuum to actually do that so, the very long answer to your question is that we’ve by design from day one put together a system we think partners are attracted to because they can take what they have and they do really well for their customers and plug it in as enabling piece inside of us, which then makes it ubiquitous to all the workloads that are being deployed.

Chris: You are going to be giving a talk on the cloud track here at QCon tomorrow, just give us a brief summary of that because people will be able to see that in a separate video later on.

Sure. It’s titled What Is Beyond Virtualization and essentially we see a lot of different customers and people who are interested in what we are doing but we are stuck in this mode of I’m going to take one server and I am going to virtualize it and then we are going to virtualize networking and we are going to virtualize storage and we are going to do all the same things we do today in terms of plugging things together, but we might be able to do it faster because it’s all software now instead of hardware. What I think becomes more powerful is that what does that enterprise platform look like in two to five years? This is not a talk on Continuum, it’s a talk on based primitives of what do we want to have dev and DevOps actually see and interact with, do we want traditional ops, compliance, security, network people to be able to interact with? And what do those pieces do in terms of aligning themselves up so that we truly have a fluid, high level of plasticity infrastructure that’s being abstracted away and can just be utilized and consumed at will, and at speed in a very, very Agile fashion, but also be compliant.

And so there are a lot of things that I have beliefs in, but I have also been talking to quite a few people about what does that really look like from auto-load bouncing to auto-reconnecting systems together when there are failures. All of those things are going to be talked about, addressed in terms of a flow of what do we have today, what are our options, but what do we really want to see two to five years from now? If you’re in a global 1000 company and it’s five years from now and you’ve just finished a new feature set for buying into Adrian’s talk around micro-service and you want to deploy it, what does that look like? Does it look like a Netflix model, does it look like the model you have today, it’s just more virtualization, but you are still doing everything by hand or is there something that we can reach for that’s better? And so my hope is that I will give a talk that will evoke a response and get some good questions going and conversations around how do we do that. And again, it’s not necessarily about Continuum, Continuum is one manifestation of what I believe this thing should look like, but there are lots of different pieces, out there today, from things like Docker, Mesos, Fleet, to CoreOS, Etcd, all these things are trying to solve point problems in a way for us to get to where I think all of us want to be, so that’s what I am going to talk about.

Chris: Excellent, well, I am looking forward to it. Thanks for stopping by today, Derek.

Absolutely, thank you.

Apr 12, 2014