At QCon San Francisco, Justin Cormack explored "The Operating System in 2018". The biggest changes in this space include: performance driven improvement, such as eBPF and userspace networking; the changing role of operations, and how operators use and deploy operating systems; and emulation and portability. There are also areas with little change so far but with signs that this is on the horizon, for example: operating systems are effectively the "last monolith"; there is a lack of diversity in OSs, and OS programming languages and contributors; and security has not yet received the full attention it requires.
Cormack, software engineer at Docker, began the talk by quoting Ken Batcher: "a supercomputer is a device for turning compute-bound problems into I/O bound problems", and noted that over the past several years "everything has changed" in relation to this quip. Modern computer storage and networking have become much faster; moving from 1Gb ethernet to 100Gb over the past decade has resulted in a communication speed increase of two orders of magnitude, and SSD can now commit data at network wire speed. Accordingly, modifications to operating systems have been required in order to keep up with these changes.
The first approach to mitigate the issue of increasingly fast I/O is to avoid the kernel/userspace switch latency. System calls are relatively slow, and so these can be avoided by writing hardware device drivers in userspace. For example, in networking DPDK is the most widely used framework, and for non-volatile memory express (NVMe) there is SPDK.
Another approach is to never leave the kernel. However, it is challenging to code for the kernel, and so eBPF has emerged as a new safe in-kernel programming language. Cormack suggested that eBPF is effectively "AWS Lambda for the Linux kernel", as it provides a way to attach functions to kernel events. eBPF has a limited safe language that uses a LLVM toolchain, and runs as a universal in-kernel virtual machine. The Cilium service mesh project makes extensive use of eBPF for performing many networking functions within the kernel at very high speed.
In summarising this section of the talk, Cormack remarked that choosing between userspace and kernel space for implementing device I/O functionality involves several tradeoffs:
Moving on to the topic of operations, he noted although the development of Unix started in 1969, modern (Linux-based) operating systems look little different than the original OSs, with the obvious exception of containing a lot more packages. However, over the past ten years the role of operations has changed radically. The vast majority of (cloud-based) operating systems never have a person log into them, most are created via APIs and automation, and the notion of "immutable infrastructure" is now the norm.
As an alternative to existing operating systems distributions, Cormack discussed LinuxKit. This is an open source toolkit for building "secure, lean and portable" Linux subsystems, which began as a Docker project. LinuxKit is designed to be built and tested within a continuous integration pipeline, and images are configured using a YAML file. The LinuxKit design philosophy encourages engineers to build composable operating systems that can boot very fast, require small amounts of resources, and also require less security patching (due to the small number of modules included within a typical LinuxKit built OS).
Next, the topic of emulation was explored, which began with a discussion of how the Linux creator, Linus Torvolds, decided upfront that Linux would have a stable ABI. This introduced challenges in regard to backwards compatibility, but at the same time provided a very stable emulation target. Recent work proves this legacy, as the Windows subsystem for Linux was launched in 2016, and Google released the userspace emulator gVisor in 2018.
This stable emulation target also means that non performance critical software can be emulated elsewhere, which is increasingly being used for security isolation -- for example, Google runs all of the Google App Engine workloads safely on its multi-tenant cloud platform using gVisor -- and also allows for running existing applications on non Linux-based operating systems, such as Windows. The longer-term implications of this is that Linux code no longer needs to be run on Linux, which will allow migration to new operating systems in the future.
Moving on to what has not changed within operating systems, Cormack highlighted the declining number of operating systems. Currently only three OS variants have significant market share: Linux (and Android), Windows, and macOS (and iOS). Accordingly, this is "leading to a monoculture", for example, where it is convenient that everything runs on Linux, but it also limits new ideas.
With a subtle nod to Torvolds' recent apology email, it was also highlighted that OS contributor diversity is poor too. Arguably, the operating system is also "the last monolith", with an average Linux distribution weighing-in at over 500 million lines of code, and Windows around 50 million, with this being the largest Git repository on the planet. The majority of OS code is also written using C, and other languages have not yet made much impact.
The final topic to be discussed was security. Linux has historically preferred agility over security, and "security as a driver for operating system change [has been] slow". The recent Meltdown and Spectre side-channel security attack may have changed this somewhat, but the influence of security requirements will take time to trickle down through the OS design process.
Following from his experience at Unikernel Systems (acquired by Docker in 2016), Cormack discussed that Unikernels could be the radical answer to these challenges. Unikernels allow an engineer to build an OS as a library that is linked to their application, which is specialised to run only this application. A unikernel can be booted directly on hardware or a VM, and due to the specialisation and limited number of components the performance profile is typically very good, and the security attack surface can be minimised. There are a number of successful Unikernel projects, including Microsoft's SQL Server for Linux, and there is a growing community in the space, for example, around the OCaml and IncludeOS projects.
Closing the talk, Cormack stated that upon reflection, perhaps operating systems have changed after all. Performance requirements and I/O improvements meant that engineers have created two new ways to run code in Linux: in userspace, as self contained systems that use little of the OS; and in-kernel eBPF, which can be thought of as "AWS Lambda for Linux". The changing role of operations over the past decade is forcing operating systems to become more composable and API-driven (e.g. LinuxKit), and emulation is making code more portable. Security will drive the next changes, and unikernels are emerging as a strong area of interest. Attendees and readers must fight the monoculture, and strive to make diversity and inclusion across technologies and communities become the new norm.
The slides for Justin Cormack's QCon SF talk "The Operating System in 2018" (PDF) can be found on the event website, and the video recording will be posted on InfoQ over the coming months.