Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News HashiCorp Release Nomad Scheduler and Otto Application Delivery Tool

HashiCorp Release Nomad Scheduler and Otto Application Delivery Tool

This item in japanese

Lire ce contenu en français

At the inaugural HashiConf conference, held in Portland, USA, HashiCorp announced the release of a new distributed scheduler platform named ‘Nomad’ that is capable of scheduling containers, VMs and standalone applications within a datacenter; and a new application delivery tool named ‘Otto’ that builds upon the existing Vagrant tool by enabling the management of remote application deployments.

Mitchell Hashimoto, CEO of HashiCorp, and Armon Dadgar, CTO of HashiCorp, discussed on stage at HashiConf that there are four qualities that distinguish Nomad in the emerging domain of schedulers, such as those contained within Mesosphere's Marathon Apache Mesos framework, Amazon's ECS and Kubernetes. These qualities were cited as ease of use, scalability, flexibility, and incorporation into the HashiCorp ecosystem.

Nomad is deployed as a single binary that handles resource pooling, job management, and task scheduling, and no additional coordination or storage services are required (such as ZooKeeper or etcd). A Nomad agent is installed on each host to collect information on available resources (CPU, memory, disk). This information is sent to the Nomad servers, which are then responsible for accepting jobs, determining which hosts have available resources, and finally scheduling tasks on those hosts.

Nomad is multi-datacenter and multi-region aware, allowing jobs to be scheduled across datacenters and regions. Nomad is designed to be resistant to network or infrastructure failure, and Nomad servers perform leader election and state replication to provide high availability and fault tolerance. Each server also participates in scheduling decisions, which HashiCorp claim increases the total throughput of task placements.

Job or task creation within Nomad utilises a single high-level specification that is agnostic to the deployment artefact, which HashiCorp claim will allow workloads to be ‘virtualized, containerized or deployed as standalone applications’. Nomad can also be integrated into the existing HashiCorp ecosystem, including Atlas, the paid-for application delivery platform.

Nomad is designed on the principles that guide all HashiCorp products and focuses on the user workflow, remaining agnostic to the underlying technologies.

The new Otto application delivery tool builds on the success of Vagrant, the company's first open source tool that was created in 2010 by Mitchell Hashimoto in order to solve the problem of managing local development environments. Otto enables the same 'vagrant up' style of workflow for deployments in addition to local development with 'otto deploy' (similar in concept to Heroku's model of 'git push heroku master' or Docker Compose's 'compose up').

The design of applications and infrastructure are codified using Otto's Appfile. Appfiles are a simple high-level specification that declare complex, multi-tier applications which can be deployed to multiple infrastructure providers as VMs or containers. Otto supports collaboration on configuration files, securely storing credentials, saving configurations, and enforcing access control policies, but integration with HashiCorp's commercial product Atlas is required in order to enable this functionality.

InfoQ recently sat down with Armon Dadgar, co-founder and CTO of HashiCorp, and asked questions about the new Nomad scheduler.

InfoQ: Hi Armon, thanks for chatting to InfoQ again. Today we're talking about your new project named 'Nomad'. Could you briefly explain to us what this is, and what goals you had when creating the project?

Dadgar: Hey Daniel, thanks for having me. Nomad is a cluster manager which is used to pool together the resources of a cluster and to simplify the workflow for running applications across the cluster. The goal is to let users focus on their application and to abstract the underlying machines away.

To use Nomad, developers submit a declarative job specification and Nomad ensures constraints are satisfied and resource utilization is maximized by efficient task packing. Nomad is designed to support multiple workloads which includes supporting Docker along with other virtualized, containerized, or standalone applications across all major operating systems.

Nomad is motivated by our larger goals of promoting microservices architectures and immutable infrastructure. As organizations move to microservices, they go from a handful of monolithic projects to dozens, hundreds, or thousands of microservices. With Nomad, each of these services can be managed as a job very easily. Without tools like Nomad the operational challenges involved become a barrier to adopting microservice architectures.

With immutable infrastructure today, we typically see deployments done at machine granularity, especially when using tools like Packer and Terraform. While this works, it is unnecessarily slow due to the time it takes to provision new machines. Instead, you'd rather have an immutable base operating system with containerized applications that are layered on top. This takes deployment times from minutes to seconds. To solve the challenge of deploying those applications you need a tool like Nomad.

InfoQ: Nomad appears very similar to existing cluster resource managers and schedulers, such as Google's Borg and the Apache Mesos project. Was Nomad or any of the algorithms utilised within Nomad based upon any of this existing work?

Dadgar: The cluster management space is filled with very exciting research and we've certainly built on the prior art there. The design of Nomad was heavily inspired by both Google Borg and Omega. Omega is lesser known but it introduces some novel approaches for optimistic concurrency in the scheduler which we have adopted in Nomad. This allows us to make scheduling decisions in parallel to handle very demanding workloads.

While Nomad is a new project, it builds on the core technologies of both Consul and Serf. Consul makes use of the Raft consensus protocol while Serf is based on the SWIM gossip algorithm. Both of those projects are widely deployed, operate at very large scale, and are production-hardened. Leveraging them in Nomad has made the project much more robust than a greenfield implementation would have been.

InfoQ: The announcement mentions that both long-lived services and short-lived batch jobs will be supported by Nomad. Do you think it is important for organisations to be able to run both types of workloads side-by-side?

Dadgar: Speaking generally, most servers are at 5-20% utilization and this inefficiency directly translates to excess cost for organizations. Nomad enables these two types of workloads to be run on the same hardware which increases efficiency and reduces cost. Nomad increases the agility of development and operations teams which is of primary importance, but the increases in efficiency and cost savings are also tremendous, so I do think it is very important.

InfoQ: We also notice that support for running Windows applications is mentioned (presumably utilising VMs?). Has this been implemented to support the emerging market of 'Enterprise' organisations migrating to cluster management solutions?

Dadgar: Windows is traditionally underserved in the DevOps space, but we consider it a first-class platform within the HashiCorp ecosystem. While unpopular among startups, Windows is deployed very broadly in the enterprise. The advantages of a cluster manager like Nomad are magnified in those environments because it solves operational challenges and reduces costs more dramatically since enterprises operate at massive scale.

The Nomad agent runs directly on Windows and uses extensible task drivers, with the aim of supporting almost any Windows based application. Virtualized workloads will be supported with Hyper-V, containerized applications with the upcoming Windows Server Containers, and standalone C# or Java applications using the CLR and JVM.

InfoQ: Nomad offers 'Multi-Datacenter and Multi-Region' awareness, which appears like a unique offering in this space. Can you explain how this works, and discuss any limitations with the current implementation?

Dadgar: Nomad models a global cluster as groups of datacenters that together form a larger region. For example the "us-west-1", "us-west-2", and "us-east-1" AWS zones could form a larger "us-aws" region. The "us-aws" region could federate with an "eu-aws" region to form a global cluster.

Nomad clients are configured with a datacenter and region, and only interact with their regional servers. The servers operate at the region level and can make scheduling decisions that span the datacenters in the region. Requests can be made about any region and get transparently forwarded by the servers.

The strength of this model is that it allows jobs to span multiple datacenters and provides failure isolation at the region level. The limitation is that a single job cannot span multiple regions, but a user can submit the same job to multiple regions.

The granularity of a region is decided by an organization, so they can pick the appropriate configuration for their needs. Most organizations can use a single global region to manage all their datacenters but it is just as easy to have one or more datacenters per region as well.

InfoQ: The announcement highlights the 'operational simplicity' of running Nomad. Was this an essential requirement, given the fun and games many of us within the industry have had with provisioning the underlying cluster management applications, and the inherent reliance (and management overhead) when working with the likes of ZooKeeper and etcd for coordination?

Dadgar: Operational simplicity was our top priority in the design and implementation of Nomad. There are many design considerations when building any distributed system, and the biggest question is providing coordination and storage. Nomad ships as a single binary with no external dependencies and operates in a highly available manner by default.

The biggest blocker to adoption for cluster managers has been their operational complexity and with Nomad I think we've solved this problem. The more moving pieces a system has, the more complex the failure cases become. It is difficult to understand and operationalize any distributed system, but requiring users to manage multiple of them was untenable.

For us, operational simplicity extends beyond just initial setup. The mental model of Nomad is understandable so users can reason about it. Developers that need to work with Nomad on a daily basis can get running in minutes. The APIs are easily consumable so that higher level tooling can be built. Simplicity and understandability lead to trust, and that is absolutely critical for a system at the heart of your infrastructure.

InfoQ: Thanks for your time today Armon. Is there anything else you would like to share with the InfoQ readers?

Dadgar: I'm extremely proud of what we have built with Nomad, but we are just getting started. Nomad is part of a larger ecosystem of HashiCorp tooling that we are excited to bring together. The next release of Nomad will integrate with Consul which brings lots of new functionality. Nomad jobs will be able to register services in Consul and leverage the service discovery features as well. The rich health checking features of Consul will be exposed without expanding the scope of Nomad.

There are integrations planned for Atlas and the entire OSS suite which we are very excited about. Helping organizations auto-scale to meet demand and reduce costs is a priority for us.

Additional details on the Nomad and Otto projects can be found on their respective project websites, or via the primary HashiCorp website. More information on the HashiConf conference, including a live stream of the keynote sessions, can be found at HashiConf 2015 website.

Rate this Article