Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews Mark Burgess on Computer Immunology and Configuration Management

Mark Burgess on Computer Immunology and Configuration Management


1. [...] Mark, who are you?

Werner's full question: We're here at Craft Conf 2016 in Budapest. I'm sitting here with Mark Burgess. Mark, who are you?

Who am I? I'm an industry person I suppose. I was professor of Computer Science for many years, founder of CFEngine, inventor of CFEngine open source project and now independent researcher.


2. [...] Can you tell us what the paper was and what you were talking about?

Werner's full question: So 20 years ago, you wrote an interesting paper which led to lot of interesting results. Can you tell us what the paper was and what you were talking about?

I guess you're talking about the Computer Immunology paper which people are sort of rediscovering now. Actually this came out of my work on CFEngine which is automation configuration management the origins of configuration management and I was at a conference talking about some technical details about how to do self-healing systems and people didn't understand what I was talking about.

Then on the plane going on, I actually got sick and I came up with this idea that we can maybe use the immune system, the human immune system, as an analogy to think about how systems, not just servers but whole systems, could be self-healing and the different scales from the small parts to the large scales.

I spent a year actually writing down what became sort of a research manifesto and I presented it at the USENIX LISA conference the year after as Computer Immunology. It was interesting because back then of course, you were in the age of hundreds or maybe thousands of servers. That was a big system. And now we're in the hundreds of thousands or millions of computers.

One of the things I wrote in that paper was that self-healing of machines was a good strategy at that time. In the future though, if you look at biology, there was also another strategy which is you have sufficient redundancy if you scratch a few skin cells off your arm, you don't bleed, you don't die. But in this world of hundreds of servers, if you lose a server it still means a lot.

Today, we're actually approaching a sort of biological scale which we can argue in this way and this of course is what is happening in cloud with the immutable infrastructure pattern. Don't repair, simply cast aside and build a new one and replace it.

So people are really rediscovering some of the ideas from this paper and it's only recently that people have been reminding me about this paper actually but yes, it goes back to the idea of how to scale repair and resilience of systems at all kinds of scales.


3. Can you give us an idea what concrete steps are when you say repair, do you need antibodies or how does it work?

There are different approaches. Antibodies are one possibility. Antibodies are agents that come to the rescue of system but they typically come from within. They're not things that come from outside. The surgeon comes from outside and does an invasive procedure but whenever you have invasive procedures, they tend to be first of all very risky and they involve downtime. It anesthetizes he machine, or the patient.

The idea of an immune system is it acts from within so the original CFEngine model is you have agents on every device which from within can repair a system to its desired state and desired state is not a string of DNA but it's a string of policy which is a bunch of settings on resources, much like DNA in a way. The analogy's quite a good one and the way to tackle it is to make sure that you have an agent on the inside of the system which can respond to changes at least as fast as the changes are happening themselves.

That was the CFEngine model and it's of course that branched off into configuration management as we know it today with Puppet and Chef and later comers. Today even at the scale of cloud, you have things like Kubernetes which are coming out which are very much in this sort of model of desired state, self-healing systems now at a somewhat different scale to configuration management, different time scale, things need to be adjusted in milliseconds, even microseconds on the virtualization scheduling front. So interesting times but old problems reemerge.


4. It turns out that the problems with Chef and Puppet solve have around for longer than just the last five years?

Yes, of course. And I think even before CFEngine, which put a face on configuration management there were approaches to configuration management, different things from shell scripts to database versions even the Windows registry is a kind of version of this but the ideas have always been recycled and reevaluated because it's always a question in economics. What's the cheapest way to solve this with the current technology and the technology is changing very fast so we have to adapt to that too.


5. I guess it's really an affirmation of declarative programming or basically the declarative model of dealing with systems?

Yes. I think the declarative model is a very powerful way of separating intent from implementation which is always a goal in computer science but also clearly expressing a desired end state which converges to an actual goal rather than diverging into a bunch of branches and some of them may survive by attrition and others just die off. There's a lot of business value in that proposition.

Werner: In a way, I guess the sort of the we're told of the Internet of Things is all around us and we'll have nanoparticles swarming around us so that's going to be even more relevant to keep those running.

Originally, what got me interested in CFEngine and developing it after this Computer Immunology thing actually was this old Xerox PARC idea of the pervasive computing or ubiquitous computing which has sort of reemerged through this Internet of Things which is almost the productization of that idea today. Certainly when devices are embedded around us at every level, we need to think a lot more about the scale of systems than we ever did.

We have a tendency to shove off the scaling problem and powering through it by brute force and faster processors or bigger servers, crunch the things essentially. But eventually this won't scale and we need to distribute the systems much more democratically around the areas and in the scenarios where they're used.

So I think certainly this agent based approach where these somewhat autonomous agents based on a guiding policy from outside are doing much of the work to repair systems at the time scales of which things are happening rather than sending data halfway around the world and waiting for somebody to be paged and then come back with -- that model just isn't going to scale in the future.

The other thing of course is that many of these devices are personal electronic devices now they're owned by a person not by a company or a central organization so we can't expect them to be managed by a central organization in the same way that we think of it today.

Werner: Yes. My nanobots can't have an admin on them basically.

You can try but good luck with that.

Werner: So that's a good point to end on. I guess all of the people who love reading CS papers would check out your paper now and discover problems have been solved in the '80s and the '90s. And thank you, Mark.

Jul 09, 2016