BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Deepak Giridharagopal on Puppet, Immutable Deployments, Analyzing Systems with PuppetDB

Deepak Giridharagopal on Puppet, Immutable Deployments, Analyzing Systems with PuppetDB

Bookmarks
   

1. We’re here at CodeMesh 2013 in London and I am sitting here with Deepak. So, Deepak, who are you?

Hi, my name is Deepak Giridharagopal, I am director of engineering at Puppet Labs, so I manage all of our open source technology.

   

2. Puppet Labs, what do you do, what’s Puppet?

Puppet is pretty much the most popular open source configuration management tool on the planet at this point, I am reasonably confortable saying that, and what Puppet does under the hood, and in general the software we produce is designed to make systems administrators have an easier time doing their job, we are sort of in the business of delighting systems administrators, which most people think of as surly individuals who live under a bridge somewhere and they only come out to complain about the software they have to deploy, that you wrote. But in general they are often times the unsung heroes of the software world because anything you write has to be administered, has to be deployed, has to be maintained in the field for as long as it’s out there, but the tool chains they have available to them often lag pretty severely behind what developers are used to. I love being at a conference like this where there is a lot of cutting edge technology that’s being discussed, but you go to most operations conference and it’s almost like you’ve stepped out of a police box 20 years ago, people are just now putting things in version control and things like that.

So, in general the types of tools that we produce are oriented around configuration management and by that I mean we give administrators tools to say how they want a system to look and then our technology makes the system look that way. That’s a big contrast to how most administration and most automation is done, where usually you have a procedural script that says you need to do steps 1, 2, 3, 4, 5. Those are pretty brittle, they are error-prone, they are not really reusable, most developers have done a pretty good job lately coming up with reusable libraries, reusable code, why don’t we have that in the operations world? It’s really rare you go to one shop and they share their bash scripts with another shop, that’s pretty uncommon. So there needs to be some higher level tooling in there that helps the administrators do their job and if you have tools that make it easy to automate things, you have now amplified the power of a systems administrator. That’s really what it is about, any shop that I’ve talked to that has ten systems today, tomorrow they are going to have a hundred and then two years from now they are going to have a thousand. But they are probably won’t be able to hire 10x or 100x as many people, so they need to be more powerful. Really, at the end of the day, my mission is to make the kind of productivity developers have seen increase due to new technology and new tools, I would love to see that happen to my fellow systems administrators.

   

3. If we drop down to a more concrete information about Puppet, you said it’s not really imperative, it’s more declarative, is that safe to say?

Puppet is a language, it’s a compiler and it’s a way of applying and executing the artifacts of the compilation process, so we present people with a declarative language that looks very much like a configuration file type of syntax which allows them to specify what they want to have happen on the system, so they say “I need to have an NTP for time synchronization, I need to have an NTP package installed, I need to have a configuration file for that, dropped onto the system with the right permissions, it’s got to have the right content in it and then here is a service definition for NTP that helps ensure that it’s actually running; oh, and by-the-way, if the config file ever changes, I need to make sure that service is automatically restarted”.

So, just that example, in Puppet code is maybe 15 lines or something like that, but it’s really high level abstractions: package NTP ensure installed; file ntp.conf owner is root, group is root, etc. But imagine doing something like that in Bash or a lower level language, then you will have a ton of code if you want error handling, which you do, you would say something as simple as installing the NTP package, that needs to check if the current package is installed, what command do you use to do that, it’s going to be a different command on different Linux distros, Lord help you if you want to do that on Windows as well, or Solaris or anything like that, so what we do is we come in with these abstractions and by making it declarative, it frees people from having to think about all this obtuse error checking that they would need to do and it also helps us make better decisions as a tool chain. For example, let’s say you have some Puppet code that manages NTP and then you have another Puppet code that manages security settings, SELinux, firewall rules etc, if that was all in a single Bash script and there was an error somewhere synchronizing your time, what would happen with the security stuff at the bottom? You may never get to it, it’s entirely dependent on how great of a Bash scripter you are. I sadly consider myself a pretty good one and even I am doubtful I would get that right every time. But in a declarative language you can specify the dependencies between things.

So what Puppet the system can do is we can tell it in advance oh, you had a problem installing the NTP package, what depends on that? Oh, the NTP config file, obviously I can’t configure that if the package didn’t work, and the service- I can’t configure that, however all your security stuff is totally unrelated, so we can actually proceed. Under the hood, Puppet builds internally a directed acyclic graph that connects all these individual resources you want to manage on a system and the edges represent all the dependencies. So what we can do if there is a certain error applying part of that graph we can do a connectedness search and we can find that part of the graph that’s affected and we can quarantine it, effectively, and we can still configure the rest of the system. And that’s something people often don’t think about as being useful because most scripts people write are targeted at a certain piece of software I want to configure it, but imagine you had a team of dozens of operations staff and they are all collaborating on a code base, you need to have ways of composing these things together. And that’s where Puppet really comes in. The real strength of the tool is that we produce this model, this graph of what everything is supposed to look like and we can do maths on that graph to the administrator's benefit.

   

4. You mentioned these abstractions, what kind of abstractions do you have? You said packages, what else?

Sure. There is a core set that’s shipped as part of the distribution, and all this stuff is free, you can basically get it on any Linux distro you want or on Mac from Homebrew, but the core set I would consider to be: execution of arbitrary commands, and I can talk about idempotency in a little bit, but management of users, groups, packages, scheduled jobs via cron, mount, there is probably more that I am missing, file content themselves. You can argue that the minimum possible set that is useful is really execution of arbitrary command and perhaps files, because everything else you can build on top of that, but we’ve optimized those other types to work with native hooks on all the platforms we care about. For example, managing users accounts, if you run it on Windows, would actually use Windows API calls to do that, whereas if you use it on a Debian system you would use useradd, or adduser. Actually I don’t even know which one it is because I tend to use Puppet to configure users, it’s kind of crazy in 2013 that people have to know the difference between useradd and adduser, that’s what I am talking about in terms of low level tools vs high level tools.

The only thing that really unifies all these types, all these abstractions is that they are designed to be idempotent. For example, if you write a Bash script to automate setting up time, what are the chances you could rerun that script if there was a problem? Maybe if there wasn’t a problem, you’d want to rerun that script again to verify that the system is actually configured the way you want and what we do is all of our abstractions have baked into their core the idea that they are idempotent, you can apply them over and over and over again and they will be completely fine, even things like execution of an arbitrary command you, we don’t force the user, but strongly suggest that they supply a set of guards that help us determine do I need to run this arbitrary command or not, because that way it becomes idempotent. What that gives the administrator is that if your whole system, the way you configure your entire stack is like that, in addition to having something that just helps you initially set up a machine you can rerun that model every ten minutes, if you want, to continually automatically repair any problems that happen. If you’ve ever had software running where someone accidentally logs into one of your systems to troubleshoot it and they install a debug version of a package, they turn off a service or they change permissions on a file and then they leave it that way and then you roll out an upgrade and something brakes, this would solve that problem, we make that problem go away.

   

5. You have a custom language and if I want to extend the system, what do I use, do I have to write in this language or is this language just for configuration or how do I build new back-ends for features?

You can actually do both, there are facilities, there is syntax within the language itself, a very simplified way, because again we are not targeting really advanced developers, we are targeting more like system administrators, we are targeting every person, so there is simplified syntax to do things, say if I have my resources for managing NTP and I have a package and I have a file and I have a service that I am all managing, you can wrap them all together in a bundle, we call them classes and then you can refer to that as a single group so you can say for this new machine I want to add the NTP class to it, which basically effectively means time is now taken care of on that system. So, we give administrators the ability to formulate these higher level abstractions and you can bundle other bundles kind of recursively to make increasingly elaborate combinations of things, there are also ways of doing parameterized versions of those.

For example, if I want to have, NTP is probably not a great example, because you only ever have one NTP service running on a system, but let’s say you had a firewall rule abstraction, you would have potentially multiples of those, so there are ways in the Puppet language itself to say these are parameterized, so you can stamp out as many as you want. But the idea is we want to give people within the language an easy way of composing these resources together, to basically form really advanced Lego bricks and you can hand those to some other administrator or another developer and really what they are concerned about is snapping the Lego bricks together, they don’t care about running a particular script or installing a specific package, they just need to know “Ah, time, that needs to be synched, include the NTP class, ah, firewall rules need to be set, include the security class”, and it’s these high level pieces they can start composing in more interesting ways. That’s when you can start focusing on higher level problems. For more interesting extensions, though, Puppet itself is actually implemented in Ruby, so the extension language, you can drop down a level and write more lower level extensions in Ruby, if you want to do that. By-and-large I’d say about 95% of people that write Puppet code, write it within the language themselves and then upload it for other people to use for free.

   

6. What’s the ecosystem of Puppet like, do Puppet Labs supply a big block of things and other people can plug in stuff or what is the situation like?

It’s both. It’s open source technology, so I think what we really want to do is give people a good platform upon which to build things and give them a place to share the stuff that they’ve built, so included in the core distribution are these basic types. But people have written their own, tons of them, other random open source users, commercial customers have written them, big companies have written custom things, so there are custom types that will manage hardware load balancers, software load balancers, network devices, storage devices, people have written all kinds of stuff, and it’s used in a pretty wide number of problem domains because a lot of these things are reusable.

So basically we have our own clearing house, our own repository of Puppet modules and content that people can upload their stuff to, it’s called the Forge, so forge.puppetlabs.com, so people can go there, they can upload modules that they create, everything from setting up Apache, setting up time, replicated databases, Oracle, have mercy on the poor soul that had to write the automation for that, but he did it so other people can reuse it, and fix it, if necessary. So, it’s a strongly collaborative culture I would say, because again part of the idea is you write the stuff once, you want to share it, get more people working on it, fix bugs, update it, that kind of things. So, I think right now there are over a couple of thousands different modules that are out there on the Forge and there have been millions and millions of downloads of them, so people are definitely using it which is great, makes me optimistic about the universe. And in addition to that, there are a number of integrations that people do with Puppet, there are a lot of orchestration tools that will integrate with it, commercial tools that are going to be integrated with it, Dell just announced that as part of their hardware offering, they are going to be bundling Puppet for initial provisioning in management of a lot of systems within a rack, so there is just a huge set of people.

Turns out to be a generic thing, if you sell hardware, if you sell network stuff, sell storage stuff, at the end of the day, you want to make it easy for people to configure your items in a unified way. And that’s always been really interesting to me, several years ago when we started I would have been surprised if someone would have told me that network vendors would be interested in using Puppet to configure their stuff. But it turns out if you have an administrator and they are used to using a high level tool like Puppet to manage everything else, then you tell them to manage a switch and they have to do a bunch of low level commands, they really sour on it, high level tools for more portions of the stack, I think it’s a big deal. Also, there is a lot of integration with cloud vendors and things like that, people have written abstractions to integrate with Cloudfront and other abstractions for setting up instances and public clouds.

   

7. Since you mentioned the C word, you get one mention for free, then you have to put a dollar in the cloud jar, I’m curious, there is a tool out called Docker which basically builds software appliances, or appliances essentially, how does it relate to Puppet? Do you work with it, is it a competitor, an alternative solution, what is the situation there?

I think they are complementary technologies. In my grand unified theory of how software is deployed, you’ve got some kind of hardware whether it’s physical or virtual, you need to have some kind of operating system put on top of that, you need to have some configuration put on top of that, that configures the base OS and then lastly you need to have some configuration for your actual application that you want to run. There are a variety of tools that attack different parts of that chain of events, my conjecture is that for the most part if you map that onto a timeline, the actual part of the timeline that is dedicated to provisioning a system is probably this much and then the actual amount of time the virtual machine or the physical machine is in service tends to be longer.

Where Docker and predecessors to Docker such as VMware, Docker under the hood is using LXC, so those are lightweight containers that do isolation and this is obviously a leaky abstraction, but one could think about it as an extremely lightweight version of a virtual machine, but a lot of the concepts that apply to virtual machines also apply. Historically, over the past 15 years or 20 years, however long configuration management as a concept has really been around, there is always been these trends that come up when people say ah, now instead of having to put a CD in a drive or a disk in a drive to get the operating system installed, I can use PXE Boot and make that automatically, so why do I need config management anymore. Turns out you do because once you get faster at provisioning more systems, your next biggest problem is actually configuring the operating system in the app, so it actually magnifies the problem. It’s good that that it’s your new biggest problem, if your biggest problem is putting disks into drives, that’s probably not a great state to be in. Virtual machines come around and now you’ve moved it again, oh well if I have vSphere, why do I need configuration management and again, now instead of having to do PXE Boot it takes 10 seconds for you to build a new virtual machine and boot that up, but now again your biggest problem is how do you make sure your application is configured correctly. Docker does the same thing only it takes that 10 seconds and it shrinks it down to hundreds of milliseconds. So now I can spin up as many of these instances as I want in a really quick and lightweight way, however in my opinion again that magnifies the need for managing all this stuff. At the end of the day, the easier we make it for people to have more machines running, whether machines are virtual, physical, containers, whatever, the more machines people have running, the more important automation becomes and the more valuable it becomes to have a good model of what you want your systems to look like, because it becomes more confusing, it’s easy to fit two machines worth of stuff in your brain, once you get to ten, once you get to a hundred, I don’t even know how many Docker containers you can have on an Amazon m1.large, probably a lot, and that stuff needs to have some higher level tooling to make sense of it all. And also a lot of people are using Puppet in conjunction with a lot of these, tons of VMware users, VMware is actually a big investor in Puppet Labs, so we have a pretty good relationship with them, as well as we do a lot of VMware provisioning as well. But the last part I want to talk about is that timeline again, because again the biggest chunk of that timeline is the service timeline of that instance that’s up. Even if you’re highly agile, you are spinning up Docker instances all the time, any time you want to deploy some new version of your software, spin up new Docker containers and shut down all the old ones.

Those containers are still running and still doing stuff for pretty long period of time which means I think they are still susceptible to configuration drift, having anomalies, someone logs in, they mess with certain settings, maybe there is a disk failure, maybe there is a network connectivity problem and it couldn’t install some package you needed to have installed, so there is always going to be this need to continually validate that your machines look the way you want. Because at the end of the day the only way you can automate doing anything across a set of systems is that it’s fundamentally based on the idea that you can reason about and they can look the same, otherwise your automation may fail on 10% of them and that’s not that useful. So, I think they are highly complementary, I am actually really excited about Docker, I think it’s pretty awesome, I am glad to see containers being more broadly useful. I think it solves one set of problems, while amplifying another. Nothing in this world comes for free, there are tradeoffs everywhere.

   

8. Here is another buzzword that’s come up in recent months, it’s immutable deployments. I think you have an opinion about that, or you have an idea about it, what is it?

I do. I started talking about immutable infrastructure, I think the first talk I gave on that was at Clojure/West, I believe, in March of this year and the way I present that to people is I am sort of framing the problem in ways that I think may resonate better with the developer audience. So, as software developers, as we are writing our software, we understand, most developers probably don’t, maybe that’s a sad reflection on the state of the industry, but the beautiful people, the wonderful people at this conference I hope understand it. But people understand the value of immutability, of constraints and invariants. You have a function that you are writing, if you know the types of the arguments that will come in, if you know they are never going to change then that helps you reason about your software, that’s why immutability is such a powerful concept. Even in languages that don’t feature it, like Java, in Effective Java, one of the first things in that book is you don’t have to worry about thread safety if all your stuff is final, even in languages like that they talk about the benefits of immutability.

So, why is immutability good? It’s good because it helps you with your reasoning, it prevents what I like to call spooky action at a distance, that’s any developer who’s looked at a particular piece of code and they think they know what it does, but some other thread running somewhere modifies one of those variables in the middle of it and you get a race condition or deadlock or something terrible but spooky action is spooky because you don’t know what is doing it, it’s really difficult to debug. If it’s immutable you don’t have that problem. And then lastly it makes it really difficult, immutability to make it really easy to make higher level abstraction for things because you have a solid foundation, solid set of data structures, a solid set of functions that you know have predictable performance, predictable behavior and then you can build more interesting things on top of it. 100% of those benefits apply to lower parts of the stack, which is stuff, as developers, we don’t normally think about. But if I am managing a fleet of 100 systems or even 10 systems, most of the time everybody has experienced this idea of spooky action at a distance, any time an administrators logs into one of your systems and tweaks something to debug it, that someone has messed with the state of something that you didn’t know about and that can cause problems when you do an upgrade.

I gave an example in my talk earlier about Knight Capital, they are a big trading company and they went bankrupt because they were losing about 173,000 dollars every second, the reason why that happened was that they messed up a deployment, an upgrade of their software and that was due to a lack of automation, there are a number of factors involved, I don’t want to reduce it to just one thing, but a big component was they had a fleet of servers they needed to upgrade their software on, they upgraded on all of them except for a very small handful, they thought that they did, but they didn’t. Some administrator a while back, had gone in and did something, for some reason the old version was still running, so when trades were executed, any trades got bounced to those systems that were running the older stuff did the exact wrong thing and they lost a ton of money and completely tanked the entire business. That’s really spooky action, expensive action, but that’s happened to almost everyone that I talked to, everyone has had a situation where there’s software in the field, someone is messing with it. So, what do we do about it? In software you can make things immutable by legitimately making them impossible to change, the semantics of the language won’t let you reach in and fiddle with these bits, in the system administration world you don’t really have that computers are fundamentally open systems, the only way to make a computer really immutable is to unplug it and then it’s not really that helpful to anybody, but it’s a known good state though, so maybe that’s better. So what we have to do is give people this façade of immutability, we have to give tools that make it seem as though we can converge on that desired state and stay there and if we can do that it’s eventually consistent. But if we can do that, it lets people get all the same benefits, I can think about a system if I know predictably what the config of every box I have, of every type is going to look like, of every app server, I know exactly what an application server looks like, soup to nuts. Then why does it matter to me whether I have 10 or whether I have 10,000?

That becomes a much more tractable problem, I had no idea what it would like, I would be terrified if I had 10,000 of them, so that right there helps people reason about their infrastructure, they can build higher level abstractions, if I know what every system looks like then I can start solving higher problems that require that level of coordination, if I know time is synchronized, I can start offloading things like cryptographic operations to a cluster of systems. Clustering itself requires a high degree of coordination in the infrastructure so you can do all these things. And lastly I think, honestly, it keeps people from getting woken up at night when pagers go off, Puppet the tool is going to work on the administrator’s behalf automatically repairing things, restoring that known good state for you without your intervention, which means you can get to the pub sooner, you can go to bed and sleep through the night. At the end of the day that’s the most important thing. So, when I say immutable infrastructure I am talking about helping giving the appearance of immutability and the benefits of immutability that developers see and extending that to a domain that is inherently immutable.

   

9. You created and talked about a tool called PuppetDB, how does that fit into Puppet and what does it actually do, is it an actual database and what does it store?

I am a system administrator, I am using Puppet to configure my systems, Puppet builds up this directed acyclic graph of the config of all my boxes, so that’s great, I have all these models that describe how every single system I have is supposed to look, that’s fantastic. But now that I have taken care of that problem, I can now work on higher level problems. For example, maybe a new vulnerability has been disclosed in a certain version of Nginx, and I need to know which of my systems are running this exact version. I could go out and actually query every single system, run a command on all of them, but if I already have all these models wouldn’t it be great if I could just interrogate those somehow? Furthermore, let’s say Puppet is deployed everywhere automatically repairing problems as they come up in the field, now administrators want to say “Well, you know what, I just need effectively a tail -f where I can see just all the problems that have been remediated, it should be an empty list, but if it’s not, I want to know about it”. So, there is a certain amount of information floating around the Puppet ecosystem that needs to be centralized and persisted and made queryable so that you can start answering these questions, and that’s what Puppet DB does.

So to the end user it basically abstracts away all the storage of this stuff, we actually are backed by a traditional relational database, but users never interact with it directly, they interact through a more abstracted query API we give them, and that lets them answer questions, like what are the vulnerable versions of Nginx that we are running, give me the list of systems where user Deepak has sudo access configured, maybe if you terminated an employee or someone lost a password, you can find out really easily what systems they have access to, in fact you can even do fancier things like, say, find me all the systems that I have sudo access to then for each of those find me any firewall rules or any VPN tunnels that are open between that and another system that I have configured and I can come up with actually a total attack surface area if my account ever got compromised.

That’s the kind of thing that is really difficult to do unless you have a model of how you want everything to look. We had a lot of requests from users for that and that’s where PuppetDB came in and that’s basically a way of persisting all of these pretty complicated object graphs, as well as inventory and factual information about systems themselves, their IP address, how much memory they have, those kinds of things and it also stores the results of when Puppet actually runs in the field, so we have stored all the models which represent the utopian, unicorn populated vision of how your infrastructure should look in an ideal universe and we have what reality looks like, the bitter crushing disappointment of reality. But the idea is over time they will converge and you should be able to measure that they converge and that’s where central storage comes in, that’s what Puppet DB does, it’s fast, scalable, central storage for everything floating around the Puppet ecosystem.

   

10. What’s the data model? You store information from the declarative setup and also other dynamic information or what is it?

There are three different types of information, there are the resource graphs themselves in Puppet parlance we call those catalogues, you have nodes with vertices that connect the nodes and things like that, so that’s going to be persistent. The nodes themselves are complicated data structures, a node would be “I am a file, I have a file name, I have an owner, a group, content, mode, all kinds of other stuff” and other types are more complicated, it’s all highly structured data, but it’s a graph, so that’s persisted. In addition to that, machines report facts about themselves, for example, let’s say I want to configure a web server on all of my systems, but one of the key components on that is I want to make sure I am listening on a particular IP address, that IP address is going to be different for each system, so one of the ways in which you can do that is when Puppet goes to configure a box first thing the box does is collect a bunch of facts about itself, here is my IP, here is my hostname, etc., and you can use that to templatize effectively the config for a system, but all those facts are really useful.

For example, people can say I need to find really quickly all the systems that are running Ubuntu 12.4 or something like that and they want to upgrade those. So that factual information is very key/value so that’s stored in a different, same ultimately backing store, but very different scheme that’s optimized for those queries. And the lastly we have these reports, those are more event or log based or “At this time stamp we try to configure this resource, the NTP package, on this system here is what happened, here is the desired state, we checked what the current state was, they diverged so here is what we did and here is what the output was in order to fix the problem and then everything is great”. So that output is stored as well. So there is three different pieces of information that are all stored in the same place and we give people a unified query API that allows them to effectively join information from any of these three places so you can say some pretty complicated things, you can do some complicated math on your infrastructure just by talking to PuppetDB.

   

11. How do you gather that information? Previously, did that information just disappear into thin air or are you hooking it to Puppet?

Originally it would just disappear into thin air, we would compile these catalogues, these resource graphs for a system and send them to the system and Puppet would run on the box and configure it, but we never stored that graph anywhere, it was just disappear into the ether, which is unfortunate because that is really powerful information. So there was an implementation of central storage for a pretty restricted subset of this stuff, several years ago, also written in Ruby, wasn’t heavily used, mostly because of performance and scale and reliability issues with it, actually before I even joined the company I was a big user of that feature and I hated how bad the implementation was, so much that as soon as I joined the company I said I am going to fix this. Of course, I can’t use it now because I left the place that I was using it, but we basically rebuilt it in a way that was designed from the ground up to store that kind of information, to answer those types of queries. So basically whenever we compile a new config for a system, we siphon off a copy and we send it to PuppetDB, whenever a system reports facts about itself, we send off a copy asynchronously, whenever a report is generated we store a copy of that asynchronously as well. So basically you have your Puppet infrastructure, PuppetDB slots in and then the system opportunistically will automatically route data to it to be stored asynchronously so it doesn’t slow down the rest of the operations you are doing.

   

12. Where can our audience get Puppet DB, is it open source, is it available?

Yes, it’s all open source, you have to know Clojure if you want to hack on it because it’s written in Clojure, but you can just go to puppetlabs.com, you can go to our GitHub repo, it’s github.com/puppetlabs, you probably don’t want to go to that landing page because there is literally a thousand open source repos out there, various things that we’ve actually built. But, yes, all our foundational technology is open source and free. Chances are you already have a version of it accessible on whatever system you are using, if you are on a Mac and you have Homebrew or MacPorts, you can actually get it through there, if you are on a Red Hat system you can do yum install it’s going to be there, Ubuntu already has it there, pretty much everything is kind of built in. You’ll probably get an older version, so I would recommend actually using packages that we provide, we have our own repos for all the OSs, you can get it straight form there, it will be more modern that way.

Werner: You said another C word, Clojure.

Yes. That’s different I don’t have to pay for that one.

   

13. That’s fine, it’s CodeMesh, you get a dollar back for that. So, Puppet is implemented in Ruby, but you chose Clojure [for PuppetDB], why? It’s obviously because it’s the best language out there, why did you choose it?

Honestly, I think it came down to a couple of main criteria. I think number one, I actually used it at a previous company and I really enjoyed the functional programming aspects of it, the actual immutability and all of that stuff, I think, I had previously built pretty reliable systems using that philosophy, in an ideal universe I knew I wanted to do something in a functional way, but from a practical standpoint there is a lot of, speed, Ruby can only use one core so that severely limits what you can do, even the things that it can do on one core it’s pretty slow at doing them in exchange for a high degree of dynamic abilities that personally I don’t really enjoy that much, I’m never really monkey patching, I always hate that stuff, because it makes it tougher to reason about my code, I think more functionally, I like to look at a piece of code, it’s referentially transparent, inputs, outputs, I know what it's going to do, I don’t have to worry about someone added some method somewhere 20 files away, so speed-wise it really wasn’t cutting it.

If you think about it this way, if you have a deployment of 1,800 systems and you run Puppet once every 30 minutes, that means we are generating one set of facts, one new object graph and one report roughly every second and these can be pretty big and complicated data structures. So to put things in context, at least in the Ruby 1.8 time period, we had code on the Puppet side that will take these graphs and objects and serialize them, send them over the wire to PuppetDB, so on the PuppetDB side we’ll take it off the wire, we’ll put it into a message queue for processing, we’ll have another thread that will wake up, open a transaction of the queue, pull the item off, do some data transformation, deserialization, validation, open up transaction to the database, serialize it that way, commit, do another commit to the queue, store some metrics, send a message to the user and then we are done. That entire process on the PuppetDB side including the I/O takes less time than it takes Ruby side, at least in Ruby 1.8, just to serialize those objects to network. So, speed is king. The faster we can do that, the better. So it’s really huge and if we want it to be fast we want to use multiple cores I thought it was highly valuable to use a functional language that had a lot of good state management facilities inside of it. To date, PuppetDB has been around for a couple of years, but it’s deployed all over the place, there are tens of thousands of deployments at this point across the planet and we’ve actually had zero bugs reported due to any kind of deadlock, zero bugs reported due to any state management problem internal to the daemon itself, variables not getting set right, we just don’t have any of those. The defect rate is shockingly low relative to a lot of the other software we have, so experiment hugely successful, I am biased obviously because I am heavily involved in writing it, but I think it’s the most reliable piece of software the company has produced in a really long time and I think the language has a lot of influence on that.

   

14. You also mentioned it is the widest deployed Clojure application, are you going on record with that?

Yes, I am going on record with it because I have asked on the list and on IRC in the Clojure channel a couple of times. I’m excluding developer tools, someone running Leiningen or something like that, but actual end user application, but I am also I am sure someone has Clojure code running on some AWS cluster that’s 27,000 instances or something like that, but I am considering that one deployment. But at this point based on our numbers, these aren’t just downloads, these are based on actual daemons that are legitimately running, we’ve been talking for about half an hour or something like that so there have been at least two or three new installations on average in the time that we’ve been talking, which is somewhat terrifying to me, but again we don’t get that many bug reports, so I think that’s pretty good.

So it’s in the tens of thousands now everywhere and no one has been able to point me to a more widely deployed Clojure application, I’d actually love it, if anyone could, anyone watching this get in contact with me I would love to swap war stories about how this stuff actually works in the field. That would be great, I somehow accidentally slipped into having some of the most deployed Clojure application out there, which is a good feeling. At the end of the day it’s weird because none of our users, 99.99% of our users don’t care at all, they don’t even know, they don’t know how it’s built, they don’t care because they are solving a different problem, which I also find kind of nice, so I figure if they’d have to know that it’s written in Clojure then I’ve done something wrong.

Werner: Right. That’s excellent, we should all look into PuppetDB, the audience, and thank you, Deepak.

Thank you, I appreciate the time.

Feb 06, 2014

BT