BT

Security Insights into the LXD Container Hypervisor

| by Hrishikesh Barua Follow 14 Followers on Oct 15, 2016. Estimated reading time: 2 minutes |

A presentation at the Linux security summit last month covered various issues and edge cases with container security in LXD, a container hypervisor from Canonical that uses Linux Containers (LXC) under the hood. The session by Stéphane Graber and Tycho Andersen went into the internal details of some of the problems.

LXD is not a different virtualization technology but a tool that utilizes LXC features. LXC is implemented using the namespaces and control groups (cgroups) features in the Linux kernel. As a result, it uses the security features provided by the namespaces API.

There is extensive usage of cgroups in LXD to enforce resource quotas like CPU, swapping, disk and network traffic limits for containers. This also implies that any issues arising out of shared kernel resources impacts all running containers. One example of this is inotify handles that are used to track filesystem changes. The global limit per user is 512, which translates to 512 being the limit across all running containers on a host. This is far too less for something like systemd. The situation is made worse by the fact that systemd fails when it runs out of inotify handles rather than using a fallback mechanism like polling the filesystem. Other examples include networking tables (e.g. used to store routing entries) and ulimit.

For some of these problems, the suggested solution is to virtualize the environment in which the limits exist, i.e., tie the limit to a namespace so that it becomes local to the container. However, for things like ulimit, it’s not entirely clear which namespace would fit.

LXD runs as a root-privileged daemon, which means that it has more privileges compared to LXC. LXD does remove a few capabilities from its containers like loading/unloading kernel modules but keeps most of them since it does not know beforehand what capabilities might be required by the applications that will run in the containers.

Linux Security Modules (LSM) is a Linux framework that enables plugging in a security model implementation for enforcing access control without becoming dependent on any specific model. Examples of LSM are AppArmor and SELinux. While LXC supports both AppArmor and SELinux, LXD currently supports only AppArmor. The primary method of LXD container isolation is a namespace, but an AppArmor profile is also installed to prevent cross container access to resources like files.

The second part of the presentation covered checkpoint/restore of containers. The checkpoint and restore process saves the memory state of a running container and enables restoring it back at some time in the future. The techniques used for checkpoint/restore involve looking deep into a process’s state via system calls like ptrace. However, security measures like seccomp can prevent such calls, thus requiring special handling of checkpoints.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT