Kubernetes is an open source project to manage a cluster of Linux containers as a single system, managing and running Docker containers across multiple hosts, offering co-location of containers, service discovery and replication control. It was started by Google and now it is supported by Microsoft, RedHat, IBM and Docker amongst others.
Google has been using container technology for over ten years, starting over 2 billion containers per week. With Kubernetes it shares its container expertise creating an open platform to run containers at scale.
The project serves two purposes. Once you are using Docker containers the next question is how to scale and start containers across multiple Docker hosts, balancing the containers across them. It also adds a higher level API to define how containers are logically grouped, allowing to define pools of containers, load balancing and affinity.
Kubernetes is still at a very early stage, which translates to lots of changes going into the project, some fragile examples, and some cases for new features that need to be fleshed out, but the pace of development, and the support by other big companies in the space, is highly promising.
Kubernetes concepts
The Kubernetes architecture is defined by a master server and multiple minions. The command line tools connect to the API endpoint in the master, which manages and orchestrates all the minions, Docker hosts that receive the instructions from the master and run the containers.
- Master: Server with the Kubernetes API service. Multi master configuration is on the roadmap.
- Minion: Each of the multiple Docker hosts with the Kubelet service that receive orders from the master, and manages the host running containers.
- Pod: Defines a collection of containers tied together that are deployed in the same minion, for example a database and a web server container.
- Replication controller: Defines how many pods or containers need to be running. The containers are scheduled across multiple minions.
- Service: A definition that allows discovery of services/ports published by containers, and external proxy communications. A service maps the ports of the containers running on pods across multiple minions to externally accesible ports.
- kubecfg: The command line client that connects to the master to administer Kubernetes.
(Click on the image to enlarge it)
Kubernetes is defined by states, not processes. When you define a pod, Kubernetes tries to ensure that it is always running. If a container is killed, it will try to start a new one. If a replication controller is defined with 3 replicas, Kubernetes will try to always run that number, starting and stopping containers as necessary.
The example app used in this article is the Jenkins CI server, in a typical master-slaves setup to distribute the jobs. Jenkins is configured with the Jenkins swarm plugin to run a Jenkins master and multiple Jenkins slaves, all of them running as Docker containers across multiple hosts. The swarm slaves connect to the Jenkins master on startup and become available to run Jenkins jobs. The configuration files used in the example are available in GitHub, and the Docker images are available as csanchez/jenkins-swarm, for the master Jenkins, extending the official Jenkins image with the swarm plugin, and csanchez/jenkins-swarm-slave, for each of the slaves, just running the slave service on a JVM container.
Creating a Kubernetes cluster
Kubernetes provides scripts to create a cluster with several operating systems and cloud/virtual providers: Vagrant (useful for local testing), Google Compute Engine, Azure, Rackspace, etc.
The examples will use a local cluster running on Vagrant, using Fedora as OS, as detailed in the getting started instructions, and have been tested on Kubernetes 0.5.4. Instead of the default three minions (Docker hosts) we are going to run just two, which is enough to show the Kubernetes capabilities without requiring a more powerful machine.
Once you have downloaded Kubernetes and extracted it, the examples can be run from that directory. In order to create the cluster from scratch the only command needed is ./cluster/kube-up.sh.
$ export KUBERNETES_PROVIDER=vagrant $ export KUBERNETES_NUM_MINIONS=2 $ ./cluster/kube-up.sh
Get the example configuration files:
$ git clone https://github.com/carlossg/kubernetes-jenkins.git
The cluster creation will take a while depending on machine power and internet bandwidth, but should eventually finish without errors and it only needs to be ran once.
Command line tool
The command line tool to interact with Kubernetes is called kubecfg, with a convenience script in cluster/kubecfg.sh.
In order to check that our cluster is up and running with two minions, just run the kubecfg list minions command and it should display the two virtual machines in the Vagrant configuration.
$ ./cluster/kubecfg.sh list minions Minion identifier ---------- 10.245.2.2 10.245.2.3
Pods
The Jenkins master server is defined as a pod in Kubernetes terminology. Multiple containers can be specified in a pod, that would be deployed in the same Docker host, with the advantage that containers in a pod can share resources, such as storage volumes, and use the same network namespace and IP. Volumes are by default empty directories, type emptyDir, that live for the lifespan of the pod, not the specific container, so if the container fails the persistent storage will live on. Other volume type is hostDir, that will mount a directory from the host server in the container.
In this Jenkins specific example we could have a pod with two containers, the Jenkins server and, for instance, a MySQL container to use as database, although we will only focus on a standalone Jenkins master container.
In order to create a Jenkins pod we run kubecfg with the Jenkins container pod definition, using Docker image csanchez/jenkins-swarm, ports 8080 and 50000 mapped to the container in order to have access to the Jenkins web UI and the slave API, and a volume mounted in /var/jenkins_home. You can find the example code in GitHub as well.
The Jenkins web UI pod (pod.json) is defined as follows:
{ "id": "jenkins", "kind": "Pod", "apiVersion": "v1beta1", "desiredState": { "manifest": { "version": "v1beta1", "id": "jenkins", "containers": [ { "name": "jenkins", "image": "csanchez/jenkins-swarm:1.565.3.3", "ports": [ { "containerPort": 8080, "hostPort": 8080 }, { "containerPort": 50000, "hostPort": 50000 } ], "volumeMounts": [ { "name": "jenkins-data", "mountPath": "/var/jenkins_home" } ] } ], "volumes": [ { "name": "jenkins-data", "source": { "emptyDir": {} } } ] } }, "labels": { "name": "jenkins" } }
And create it with:
$ ./cluster/kubecfg.sh -c kubernetes-jenkins/pod.json create pods Name Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- jenkins csanchez/jenkins-swarm:1.565.3.3 <unassigned> name=jenkins Pending
After some time, depending on your internet connection, as it has to download the Docker image to the minion, we can check its status and in which minion is started.
$ ./cluster/kubecfg.sh list pods Name Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- jenkins csanchez/jenkins-swarm:1.565.3.3 10.0.29.247/10.0.29.247 name=jenkins Running
If we ssh into the minion that the pod was assigned to, minion-1 or minion-2, we can see how Docker started the container defined, amongst other containers used by Kubernetes for internal management (kubernetes/pause and google/cadvisor).
$ vagrant ssh minion-2 -c "docker ps" CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7f6825a80c8a google/cadvisor:0.6.2 "/usr/bin/cadvisor" 3 minutes ago Up 3 minutes k8s_cadvisor.b0dae998_cadvisormanifes12uqn2ohido76855gdecd9roadm7l0.default.file_cadvisormanifes12uqn2ohido76855gdecd9roadm7l0_28df406a 5c02249c0b3c csanchez/jenkins-swarm:1.565.3.3 "/usr/local/bin/jenk 3 minutes ago Up 3 minutes k8s_jenkins.f87be3b0_jenkins.default.etcd_901e8027-759b-11e4-bfd0-0800279696e1_bf8db75a ce51fda15f55 kubernetes/pause:go "/pause" 10 minutes ago Up 10 minutes k8s_net.dbcb7509_0d38f5b2-759c-11e4-bfd0-0800279696e1.default.etcd_0d38fa52-759c-11e4-bfd0-0800279696e1_e4e3a40f e6f00165d7d3 kubernetes/pause:go "/pause" 13 minutes ago Up 13 minutes 0.0.0.0:8080->8080/tcp, 0.0.0.0:50000->50000/tcp k8s_net.9eb4a781_jenkins.default.etcd_901e8027-759b-11e4-bfd0-0800279696e1_7bd4d24e 7129fa5dccab kubernetes/pause:go "/pause" 13 minutes ago Up 13 minutes 0.0.0.0:4194->8080/tcp k8s_net.a0f18f6e_cadvisormanifes12uqn2ohido76855gdecd9roadm7l0.default.file_cadvisormanifes12uqn2ohido76855gdecd9roadm7l0_659a7a52
And, once we know the container id, we can check the container logs with vagrant ssh minion-1 -c "docker logs cec3eab3f4d3"
We should also see the Jenkins web UI at http://10.245.2.2:8080/ or http://10.0.29.247:8080/, depending on what minion it was started in.
Service discovery
Kubernetes allows defining services, a way for containers to use discovery and proxy requests to the appropriate minion. With this definition in service-http.json we are creating a service with id jenkins pointing to the pod with the label name=jenkins, as declared in the pod definition, and forwarding the port 8888 to the container's 8080.
{ "id": "jenkins", "kind": "Service", "apiVersion": "v1beta1", "port": 8888, "containerPort": 8080, "selector": { "name": "jenkins" } }
Creating the service with kubecfg:
$ ./cluster/kubecfg.sh -c kubernetes-jenkins/service-http.json create services Name Labels Selector IP Port ---------- ---------- ---------- ---------- ---------- jenkins name=jenkins 10.0.29.247 8888
Each service is assigned a unique IP address tied to the lifespan of the Service. If we had multiple pods matching the service definition the service would load balance the traffic across all of them.
Another feature of services is that a number of environment variables are available for any subsequent containers ran by Kubernetes, providing the ability to connect to the service container, in a similar way as running linked Docker containers. This will provide useful for finding the master Jenkins server from any of the slaves.
JENKINS_PORT='tcp://10.0.29.247:8888' JENKINS_PORT_8080_TCP='tcp://10.0.29.247:8888' JENKINS_PORT_8080_TCP_ADDR='10.0.29.247' JENKINS_PORT_8080_TCP_PORT='8888' JENKINS_PORT_8080_TCP_PROTO='tcp' JENKINS_SERVICE_PORT='8888' SERVICE_HOST='10.0.29.247'
Another tweak we need to do is to open port 50000, needed by the Jenkins swarm plugin. It can be achieved creating another service service-slave.json so Kubernetes forwards traffic to that port to the Jenkins server container.
{ "id": "jenkins-slave", "kind": "Service", "apiVersion": "v1beta1", "port": 50000, "containerPort": 50000, "selector": { "name": "jenkins" } }
The service is created with kubecfg again.
$ ./cluster/kubecfg.sh -c kubernetes-jenkins/service-slave.json create services Name Labels Selector IP Port ---------- ---------- ---------- ---------- ---------- jenkins-slave name=jenkins 10.0.86.28 50000
An all the defined services are available now, including some Kubernetes internal ones:
$ ./cluster/kubecfg.sh list services Name Labels Selector IP Port ---------- ---------- ---------- ---------- ---------- kubernetes-ro component=apiserver,provider=kubernetes 10.0.22.155 80 kubernetes component=apiserver,provider=kubernetes 10.0.72.49 443 jenkins name=jenkins 10.0.29.247 8888 jenkins-slave name=jenkins 10.0.86.28 50000
Replication controllers
Replication controllers allow running multiple pods in multiple minions. Jenkins slaves can be run this way to ensure there is always a pool of slaves ready to run Jenkins jobs.
In a replication.json definition:
{ "id": "jenkins-slave", "apiVersion": "v1beta1", "kind": "ReplicationController", "desiredState": { "replicas": 1, "replicaSelector": { "name": "jenkins-slave" }, "podTemplate": { "desiredState": { "manifest": { "version": "v1beta1", "id": "jenkins-slave", "containers": [ { "name": "jenkins-slave", "image": "csanchez/jenkins-swarm-slave:1.21", "command": [ "sh", "-c", "/usr/local/bin/jenkins-slave.sh -master http://$JENKINS_SERVICE_HOST:$JENKINS_SERVICE_PORT -tunnel $JENKINS_SLAVE_SERVICE_HOST:$JENKINS_SLAVE_SERVICE_PORT -username jenkins -password jenkins -executors 1" ] } ] } }, "labels": { "name": "jenkins-slave" } } }, "labels": { "name": "jenkins-slave" } }
The podTemplate section allows the same configuration options as a pod definition. In this case we want to make the Jenkins slave connect automatically to our Jenkins master, instead of relying on Jenkins multicast discovery. To do so we execute the jenkins-slave.sh command with -master parameter to point the slave to the Jenkins master running in Kubernetes. Note that we use the Kubernetes provided environment variables for the Jenkins service definition (JENKINS_SERVICE_HOST and JENKINS_SERVICE_PORT). The image command is overridden to configure the container this way, useful to reuse existing images while taking advantage of the service environment variables. It can be done in pod definitions too.
Create the replicas with kubecfg:
$ ./cluster/kubecfg.sh -c kubernetes-jenkins/replication.json create replicationControllers Name Image(s) Selector Replicas ---------- ---------- ---------- ---------- jenkins-slave csanchez/jenkins-swarm-slave:1.21 name=jenkins-slave 1
Listing the pods now would show new ones being created, up to the number of replicas defined in the replication controller.
$ ./cluster/kubecfg.sh list pods Name Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- jenkins csanchez/jenkins-swarm:1.565.3.3 10.245.2.3/10.245.2.3 name=jenkins Running 07651754-4f88-11e4-b01e-0800279696e1 csanchez/jenkins-swarm-slave:1.21 10.245.2.2/10.245.2.2 name=jenkins-slave Pending
The first time running jenkins-swarm-slave image the minion has to download it from the Docker repository, but after a while, depending on your internet connection, the slaves should automatically connect to the Jenkins server. Going into the server where the slave is started, docker ps has to show the container running and docker logs is useful to debug any problems on container startup.
$ vagrant ssh minion-1 -c "docker ps" CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 870665d50f68 csanchez/jenkins-swarm-slave:1.21 "/usr/local/bin/jenk About a minute ago Up About a minute k8s_jenkins-slave.74f1dda1_07651754-4f88-11e4-b01e-0800279696e1.default.etcd_11cac207-759f-11e4-bfd0-0800279696e1_9495d10e cc44aa8743f0 kubernetes/pause:go "/pause" About a minute ago Up About a minute k8s_net.dbcb7509_07651754-4f88-11e4-b01e-0800279696e1.default.etcd_11cac207-759f-11e4-bfd0-0800279696e1_4bf086ee edff0e535a84 google/cadvisor:0.6.2 "/usr/bin/cadvisor" 27 minutes ago Up 27 minutes k8s_cadvisor.b0dae998_cadvisormanifes12uqn2ohido76855gdecd9roadm7l0.default.file_cadvisormanifes12uqn2ohido76855gdecd9roadm7l0_588941b0 b7e23a7b68d0 kubernetes/pause:go "/pause" 27 minutes ago Up 27 minutes 0.0.0.0:4194->8080/tcp k8s_net.a0f18f6e_cadvisormanifes12uqn2ohido76855gdecd9roadm7l0.default.file_cadvisormanifes12uqn2ohido76855gdecd9roadm7l0_57a2b4de
The replication controller can automatically be resized to any number of desired replicas:
$ ./cluster/kubecfg.sh resize jenkins-slave 2
And again the pods are updated to show where each replica is running.
$ ./cluster/kubecfg.sh list pods Name Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- 07651754-4f88-11e4-b01e-0800279696e1 csanchez/jenkins-swarm-slave:1.21 10.245.2.2/10.245.2.2 name=jenkins-slave Running a22e0d59-4f88-11e4-b01e-0800279696e1 csanchez/jenkins-swarm-slave:1.21 10.245.2.3/10.245.2.3 name=jenkins-slave Pending jenkins csanchez/jenkins-swarm:1.565.3.3 10.245.2.3/10.245.2.3 name=jenkins Running
Scheduling
Right now the default scheduler is random, but resource based scheduling will be implemented soon. At the time of writing there are several issues opened to add scheduling based on memory and CPU usage. There is also work in progress in an Apache Mesos based scheduler. Apache Mesos is a framework for distributed systems providing APIs for resource management and scheduling across entire datacenter and cloud environments.
Self healing
One of the benefits of using Kubernetes is the automated management and recovery of containers.
If the container running the Jenkins server dies for any reason, for instance because the process being ran crashes, Kubernetes will notice and will create a new container after a few seconds.
$ vagrant ssh minion-2 -c 'docker kill `docker ps | grep csanchez/jenkins-swarm: | sed -e "s/ .*//"`' 51ba3687f4ee $ ./cluster/kubecfg.sh list pods Name Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- jenkins csanchez/jenkins-swarm:1.565.3.3 10.245.2.3/10.245.2.3 name=jenkins Failed 07651754-4f88-11e4-b01e-0800279696e1 csanchez/jenkins-swarm-slave:1.21 10.245.2.2/10.245.2.2 name=jenkins-slave Running a22e0d59-4f88-11e4-b01e-0800279696e1 csanchez/jenkins-swarm-slave:1.21 10.245.2.3/10.245.2.3 name=jenkins-slave Running
And some time later, typically no more than a minute...
Name Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- jenkins csanchez/jenkins-swarm:1.565.3.3 10.245.2.3/10.245.2.3 name=jenkins Running 07651754-4f88-11e4-b01e-0800279696e1 csanchez/jenkins-swarm-slave:1.21 10.245.2.2/10.245.2.2 name=jenkins-slave Running a22e0d59-4f88-11e4-b01e-0800279696e1 csanchez/jenkins-swarm-slave:1.21 10.245.2.3/10.245.2.3 name=jenkins-slave Running
Running the Jenkins data dir in a volume we guarantee that the data is kept even after the container dies, so we do not lose any Jenkins jobs or data created. And because Kubernetes is proxying the services in each minion the slaves will reconnect to the new Jenkins server automagically no matter where they run! And exactly the same will happen if any of the slave containers dies, the system will automatically create a new container and thanks to the service discovery it will automatically join the Jenkins server pool.
If something more drastic happens, like a minion dying, Kubernetes does not offer yet the ability to reschedule the containers in the other existing minions, it would just show the pods as Failed.
$ vagrant halt minion-2 ==> minion-2: Attempting graceful shutdown of VM... $ ./cluster/kubecfg.sh list pods Name Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- jenkins csanchez/jenkins-swarm:1.565.3.3 10.245.2.3/10.245.2.3 name=jenkins Failed 07651754-4f88-11e4-b01e-0800279696e1 csanchez/jenkins-swarm-slave:1.21 10.245.2.2/10.245.2.2 name=jenkins-slave Running a22e0d59-4f88-11e4-b01e-0800279696e1 csanchez/jenkins-swarm-slave:1.21 10.245.2.3/10.245.2.3 name=jenkins-slave Failed
Tearing down
kubecfg offers several commands to stop and delete the replication controllers, pods and services definitions.
To stop the replication controller, setting the number of replicas to 0, and causing the termination of all the Jenkins slaves containers:
$ ./cluster/kubecfg.sh stop jenkins-slave
To delete it:
$ ./cluster/kubecfg.sh rm jenkins-slave
To delete the jenkins server pod, causing the termination of the Jenkins master container:
$ ./cluster/kubecfg.sh delete pods/jenkins
To delete the services:
$ ./cluster/kubecfg.sh delete services/jenkins $ ./cluster/kubecfg.sh delete services/jenkins-slave
Conclusion
Kubernetes is still a very young project, but highly promising to manage Docker deployments across multiple servers and simplify the execution of long running and distributed Docker containers. By abstracting infrastructure concepts and working on states instead of processes, it provides easy definition of clusters, including self healing capabilities out of the box. In short, Kubernetes makes management of Docker fleets easier.
About the Author
Carlos Sanchez has been working on automation and quality of software development, QA and operations processes for over 10 years, from build tools and continuous integration to deployment automation, DevOps best practices and continuous delivery. He has delivered solutions to Fortune 500 companies, working at several US based startups, most recently MaestroDev, a company he cofounded. Carlos has been a speaker at several conferences around the world, including JavaOne, EclipseCON, ApacheCON, JavaZone, Fosdem or PuppetConf. Very involved in open source, he is a member of the Apache Software Foundation amongst other open source groups, contributing to several projects, such as Apache Maven, Fog or Puppet.