Q&A with Greg Kurtzer from the GPU Technology Conference

This item in japanese

Jun 06, 2017 7 min read

Follow us on

Youtube232K Followers

Linkedin26K Followers

NVIDIA made a number of announcements at the recently concluded GPU Tech Conference. In his keynote address Jensen Huang, the CEO of NVIDIA, talked about the rebirth of AI, Deep Learning, and of course the next generation GPU codenamed Volta, and how that would address some of the pain points in training Machine Learning algorithms.

The conference had a main keynote, break out sessions, posters, and Hands-On Labs that were of interest to novices in GPUs, to deep learning experts alike. The topics were wide ranging from Virtual Reality to Containers, which have recently become a trend in the High Performance Computing arena.

InfoQ caught up with Gregory M. Kurtzer, a leader and contributor to many open source projects including Centos, Warewulf and Singularity about the evolution of the GPU tech conference and about containers and HPC.

InfoQ: GPUs have been around for a while. Same with Containers. What’s new about using GPU capabilities inside containers?

Gregory M. Kurtzer: There is nothing specifically "new" about using GPU capabilities inside containers, as technologies like "nvidia-docker" have enabled that for some time now. This has worked fantastic for scientists who operate on local private resources, and have used technologies like Docker to enable collaboration and portability, but these technologies are significantly limited in terms of their applicability on general purpose HPC and scientific computing resources.

Singularity brings a new usage paradigm to the table which enables container support on HPC resources in such a manner that supports a much larger breadth of science use-cases as well as the focus on reproducibility and mobility of compute. All this and native support for GPUs!

InfoQ: You can see Docker and containers making inroads into all types of workloads. Is HPC suitable for containerized applications? Is there a particular category of HPC workloads that it’s better suited for?

Kurtzer: To answer accurately, I would have to clarify the question by separating Docker and containers. While Docker is a container solution, there are multiple container solutions that are not Docker.

Starting with the Docker side of the question, Docker is a fantastic solution for micro-service virtualization for enterprise. It is designed specifically for this use-case, and thus that is what it excels at. It can be used for some forms of local, private scientific computing as can be seen by researchers specifically utilizing their laptops and workstations for building these workflows and then pushing them into DockerHub. But, if these users want to scale their research up to traditional HPC they may face a dead end.

There are some very vocal advocates for Docker in scientific computing, specifically HPC, which has caused much confusion and problems. It is unfortunate because every HPC center has had to deal with the unsatisfied expectations this has led to. The Docker architecture and workflow is not compatible with traditional HPC infrastructures.

Now, if you consider HPC applications as I do (tightly coupled, highly scalable, MPI based applications), the applicability to containers may not be very compelling because the user space library stack must be highly tuned and optimized specifically for the underlying hardware which will become a tradeoff between portability.

But, if one considers the rest of the vast ecosystem of scientific computing applications which are not of these (commonly referred to as the "long tail of science"), then the use-cases are massive in quantity and easily tenable.

There is an additional use-case that is very quickly gaining momentum. This is computing agility also known as mobility of compute, and it allows for users to be able to package up a particular scientific workflow and run it reproducibility on a variety of hosts and resources.

InfoQ: Can you give a brief history of Singularity and how it brings together containers and HPC?

Kurtzer: Singularity is a container system focused on the use cases of reproducibility, mobility, agility, and HPC compatibility for scientific computing.

To enable mobility, reproducibility and agility Singularity goes a step further than other container systems by utilizing a single file as the container image. Encapsulated within this image is the entire operating system, environment, applications, and workflows necessary to replicate the entire software stack for a given workflow. This single image makes it simple to utilize a contained environment from one system to another, simply by copying the single image file. If you want to branch the image, it is as easy as copying the file, if you want to share it, you can open up the POSIX permissions to that file, or email it. Conversely, if you have controlled data, libraries or applications within your container, you can limit access to that container in exactly the same manner that you currently limit access; the container is just a file, like all of the rest of your data! Additionally, using a single file makes it highly optimized for concurrent parallel usage, especially over high performance parallel file systems. We have been seeing some very significant speedups thanks to running certain types of jobs from within Singularity images.

Now to obtain support on traditional shared HPC resources is where Singularity further diverges from other enterprise focused container systems. HPC systems typically already have both trusted and untrusted users logged on to the system via command line shells, and it is these same users that we want to provide not only container access to, but also allow them to bring their own untrusted containers. Singularity allows untrusted users to run untrusted containers in a trusted manner by maintaining continuity between the parent process and the contained child processes which carries any limitations imposed on the user’s shell into the container. Singularity also ensures that the user inside the container is always the same user as the calling user. Once that has been done, we block any user the ability to increases their privileges from within the container.

This is a very different usage model from enterprise and it also enables the ability to be able to directly share resources from the host into the container and thus blur the line between what is contained and what is on the host. It also allows the applications within the containers to have easy integration with the resource manager and existing HPC resources (file systems, GPUs, high performance interconnects, etc.).

The massive uptake of Singularity speaks directly to the necessity of the features that we have brought to the table. At this point, I am looking for people (both volunteers and hires) to help with the project!

InfoQ: NVIDIA provides a Docker plugin and an alternate CLI based around the plugin that takes advantage of GPU capabilities. What is different between that approach and the Singularity approach?

Kurtzer: Nvidia’s Docker solution counters Docker’s design premise of isolation by un-isolating the devices, remapping drivers, libraries and the necessary bits into the running Docker instance. This method works fine for interacting with the host’s GPUs from within a container, on a stand alone system, but it does not fix the multitude of other issues that render Docker incompatible with traditional HPC systems.

Singularity is designed specifically around the traditional HPC use cases as well as supporting controlled software, libraries and data (e.g. export controls, HIPAA, government or trade secrets). Additionally, Singularity’s ability to easily blur the line between host and container, allows the contained applications to easily interact with the host’s GPUs natively.

InfoQ: Can you talk about other platforms which compete with Singularity for containerized HPC workloads? What’s the current community involvement and roadmap for Singularity, especially with respect to support for Docker ecosystem, like Kubernetes, Docker Swarm, etc.?

Kurtzer: Competitors to Singularity are very few as there are no other container systems that focus on computational reproducibility and mobility for non-root users by utilization of a single image file. Singularity sits in this niche alone.

There is a lot of interest currently on getting Kubernetes and Mesos to support Singularity. Unfortunately, this is all coming from the research side which has no interest for the primary maintainers of these projects (at least as far as I’ve been able to tell thus far). So, I have some funding in which I am looking to hire developers to help with this.

As far as our roadmap, Singularity is uniquely positioned to support things like "trusted computing" and other paradigms that require highly trusted, controlled workflows. This is a big endeavour on our roadmap as well as support for containerizing background daemon processes and a new concept of data containers which Singularity is pioneering.

InfoQ: What was your primary takeaway from the GPU tech conference? What do you see in the future for GPUs?

Kurtzer: I can’t believe how big the conference was! The lunch service made my head spin because it was so large and overwhelming (imagine a building the size of an aircraft hanger, completely full with people, everywhere, practically piled up!). Machine learning has grown some metal legs, and it has taken GPUs to the next level! Social anxieties beware, GTC has grown up!

Keynote sessions and other recordings are available from NVIDIA's on demand GTC web site.

InfoQ Software Architects' Newsletter

Q&A with Greg Kurtzer from the GPU Technology Conference

Follow us on

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter