Every developer knows the pain of incompatible software. Problems arise quickly when working on multiple projects needing different versions of a Java runtime, especially on OsX. Ruby has its own version manager for a reason. Two colleagues of mine spent hours debugging incompatibilities of their OpenSSL and Python versions with a Homebrew package. Can we use containers to solve these problems? The answer is: "Yes, we can!"
The main goal of containers is to deliver software. The newly founded Open Containers project gives the following definition:
The goal of a Standard Container is to encapsulate a software component and all its dependencies in a format that is self-describing and portable, so that any compliant runtime can run it without extra dependencies, regardless of the underlying machine and the contents of the container.
This definition does not state anything about the kind of software being distributed. This is on purpose, because containers are content agnostic by design. What you want to deliver and how that should be used, is up to you entirely. In this article I propose a distinction between service images and executable images. I will also advocate the use of executable images.
Executable images are less common than service images, but a very useful addition to the mix. They will solve the problems of incompatible software, and more. Using the official Docker Maven image as an example, we will investigate what executable images are, how they work and how you can create them yourself. The ENTRYPOINT directive of Dockerfiles plays a central role in executable images.
Service images vs. executable images
Traditionally, container images are used for long running processes: services that are run on a server, not influencing the host because they are contained. We call these service images. Web servers, load balancers and databases are good examples of service images. These kind of containers can be easily compared to virtual machines.
Container images can also be used for short lived processes: a containerized executable meant to be run on your computer. These containers execute a single task, are short lived and can generally be removed after use. We call these executable images. Examples are compilers (Golang) or build tools (Maven), presentation software (I love to hack a simple presentation in Markdown format and let a RevealJS Docker image serve that) and browsers (a fresh contained browser to follow that fishy link). A real evangelist for executable images is Docker's own Jessie Frazelle. To get some great inspiration be sure to read her blog about them or check out this presentation at DockerCon 2015.
The distinction between service images and executable images is not set in stone. All images are by definition executable, since their job is to run a process. Running a presentation or a browser in a container are perfect examples of local tools, so I would call these executable images. Even though they are long running processes. That being said, I hope you agree the categorization makes sense. It's more about the intent of the image, not about whether the process is long lived or short lived.
Advantages of executable images
So what are the advantages of executable images? How do they solve the problems I described earlier?
One reason to experiment with executable images is that they are a great way to get started with Docker. These experiments can be very useful and do not affect your production environments. It's also a lot of fun!
Another reason is ease of installation. I know, we have package managers like apt-get, yum, macports, homebrew. And those work perfectly... at least most of the time... usually... unless you really need them. The thing is, these tools are great at one thing: managing dependencies. They're not so great at managing two versions of a single package, including its dependency tree. A container has no dependencies by design: all the dependencies are baked into the image. The installation itself is implied by running the docker run command. In case the image is not found on your system, Docker will automatically download (pull) it. By encapsulating dependencies with the software a container image is a reliable way to distribute software. Testing a container image also tests if the dependencies work with the main functionality. For everyone.
Containerized executables are just that: containerized. In other words: sandboxed. This lowers the risk of running not completely trusted software and negates many program vulnerabilities. One example is following a fishy link in a browser. Start a fresh browser in a clean file system to make that more safe. Another is a bug that occurred a couple of months ago where Valve's Steam deleted all user files including attached drives! Docker's sandboxing is not perfect, but it would definitely have avoided clearing out the family photo library.
Because the process and its dependencies are contained, running different versions of the same software is easy! Usually, when starting out on a Java/Maven project you need to install the correct versions of the Java Development Kit (JDK) and Maven. With Docker, we can skip that. The JDK and Maven can be installed in an executable image by one of the team members. Anyone is then able to check out the source code and directly compile and test them. You can use another image based on a different JDK version for another project. You can even compile those projects at the same time! And you don't need to worry about a $JAVA_HOME environment variable.
The Maven image
A service image is built to run a service in a specific way. It may need information about its environment, such as the address of a database, but not much more. Executable images are tools specifically built to interact with your system. There are multiple techniques to accomplish this. We will look at the Maven compiler image to examine these techniques. Note that the techniques are general, so bear with me if you do not like Java.
Passing files as a volume for configuration
Suppose we have a Maven project containing our Java sources. This contains at least a pom.xml file and a /src/main/java directory in the project root. For the purposes of this article, you can take any Maven project you like. If you don't have a Maven project, you can download one at Spring Boot (select Maven Project as Type). Using the command line to cd into the project directory (containing the pom.xml file), we can execute the following:
user:project$ docker run --rm \
-v $(pwd):/project \
-w /project \
maven:3.3.3-jdk-8 mvn install
This command does a couple of things:
- docker run creates a container instance of the maven:3.3.3-jdk-8 image. It executes the command mvn install inside of the container. In principal, this cannot influence your system;
- -v $(pwd):/project mount our current directory into the container as /project. This allows the container to read and write in the current directory on your system;
- -w /project sets /project as the working directory, which in turn point to the project directory. This means the mvn command will effectively be executed in the project directory;
- --rm removes the container after execution. Good riddance!
The result is the same as if we were to run mvn install from our project directory directly on the host, only without actually having to install Java or Maven. We end up with a target directory in the project containing the compiled Java application.
We can clean the project by running the maven clean command:
user:project$ docker run --rm \
-v $(pwd):/project \
-w /project \
maven:3.3.3-jdk-8 mvn clean
Using an entry point to pass arguments
The function of the Maven image is to run mvn [args]. So you could argue it redundant to have to specify mvn in the docker command. To address this, Docker provides an entry point. The entry point is strongly related to the command. Both can be specified in a Dockerfile. The instructions are ENTRYPOINT and CMD, respectively. These instructions are applied to the container image as meta data, which can be overridden in the docker run command. We can then execute mvn clean install as follows:
user:project$ docker run --rm \
-v $(pwd):/project \
-w /project \
--entrypoint mvn \
maven:3.3.3-jdk-8 clean install
The entry point and the command are concatenated and executed as one. The advantage is separation of concerns. In case of an executable container image, use the entry point as the constant part, and the command as the variable part.
The separation becomes more elegant if we incorporate the entry point into the container image. To do so, create a Dockerfile in a different directory with the following contents:
FROM maven:3.3.3-jdk-8
WORKDIR /project
ENTRYPOINT ["mvn"]
CMD ["-h"]
where we also added the working directory, so our new image expects a Maven project to be mounted as /project. This Dockerfile defines an entry point and command in the exec form, which can be recognized by the bracket notation and is preferred above the shell form. Building an image with “docker build -t my_mvn .” in the directory containing the Dockerfile gives us the option to simplify our earlier execution command to:
user:project$ docker run --rm \
-v $(pwd):/project \
my_mvn clean install
where clean install could of course be any other mvn arguments. If you forget to include a command, the maven help will be printed because of the default -h command in the Dockerfile.
Another good use for an entry point is to define a helper script. For instance, if you need to run a few commands before your actual service can be started correctly, a helper script can take care of that. Or, such a script could check if all necessary runtime configuration such as links or environment variables are supplied. The command itself becomes an argument for the startup script, but is transparently executed by the script. For more information on this and a simple example, please see the Dockerfile best practices in the Docker documentation.
Creating aliases for executable images
You can also create an alias for your executable images. This way, you can type a short instruction like for regular programs. In your ~/.profile, include
mvn() {
docker run --rm \
-v $(pwd):/project \
my_mvn $*
}
We need to pass parameters, so we use a function instead of an alias. After running source ~/.profile to load the changes, we can simply use
user:project$ mvn clean install
Caching the maven local repository using a volume
A disadvantage of the current approach is that maven artifacts are now downloaded during each run. A local Maven installation always includes a repository directory where all artifacts are stored. The current approach is very clean, but not practical. Let’s add the maven repository as a volume. Create a directory such as /usr/tmp/.m2 and run
user:project$ docker run --rm \
-v $(pwd):/project \
-v /usr/tmp/.m2:/root/.m2 \
my_mvn install
The directory /usr/tmp/.m2 on the host is now populated with the artifacts downloaded by maven. And each time we start a maven container image this way, we refer to that directory, so maven can re-use those artifacts. Run mvn install twice to see the difference.
We just made the Maven builds faster. However, we pay for this by now having to manage a directory on our Docker host. The last step in this article is to have Docker manage this volume. First, create a named data container:
user:project$ docker run --name maven_data \
-v /root/.m2 \
maven:3.3.3-jdk-8 echo 'data for maven'
This container exits after printing "data for maven", but creates a volume. It doesn't really matter what image we use: in this case, maven:3.3.3-jdk-8 is handy because it is already downloaded, while my_mvn is less handy, due to the entry point that would be prepended to the echo statement. Note the lack of a colon in -v /root/.m2: we do not refer to a directory on the host any longer. Instead, Docker creates a directory on the host in its own data directory. Using "data" in the name and command is not needed, but makes explicit that this is a data container, which will now be reflected when you run docker ps. We can refer to this container's volume with --volumes-from, without caring about where Docker keeps the actual directory. Doing so mounts the volume into the referring container under /root/.m2. This technique is also useful to share data between containers. Let's do this in our ~/.profile:
mvn() {
docker run --rm \
-v $(pwd):/project \
--volumes-from maven_data \
my_mvn $*
}
Now, when we run mvn, the maven home directory will be mapped to the volume. The maven container itself will be deleted, but the volume remains with the local repository of cached downloads. If you want to clean your system, remove the data container with
docker rm -v maven_data
-v indicates removal of the volumes associated with the container in case
a) the volume is managed by Docker
b) no other containers refer to the volume
A word of caution: if you forget to use the -v option, you will end up with orphan volume directories.
Conclusion
Executable container images are a powerful application of Docker. They are useful to distribute software or to run it on your machine in a contained and deterministic manner. Also, they are a fun way to start out experimenting with Docker. I hope you feel inspired to start experimenting and use the techniques described in this article.
About the Author
Quinten Krijger started his career in IT at Finalist, after a studies in Physics and a year of classical singing. Later, he moved to Trifork Amsterdam, mainly continuing back-end work on projects using Open Source technology such as Java, Spring, ElasticSearch and MongoDB, with a healthy interest in up-to-date front end programming. His passion is shortening the feedback cycle to enable agile development: testing, CI and DevOps being key here. Very shortly after its conception, he took an interest in Docker, being very impressed with the wide range of possibilities efficient containers provide. He focused his efforts on that at the startup Container Solutions for half a year and is currently doing DevOps at ING.