Immutable infrastructure is an effective application delivery workflow, but the deployment process is slow and resources are often underutilized. Building images and creating new hosts on every application update leads to a deploy process that can take 10 minutes or more. If there’s only one service per host, it usually does not use all the available CPU, memory, and disc resources on the host. If only 20% of resources are used, then the remaining 80% is wasted expense. Schedulers like Nomad, Mesos, and Kubernetes allow organizations to split applications and infrastructure into separate immutable layers, which speeds up deployment times and increases resource density, while still maintaining the benefits of immutable infrastructure.
A scheduler takes a packaged application, such as a Docker container, JAR, binary, etc, and places it on a host. Multiple applications can be placed on the same host to increase resource density and save money. For example, instead of running 100 hosts at 20% capacity, a company can run 25 hosts at 80% capacity and reduce spend on infrastructure by 75%. The deployment process is significantly faster since the packaged application is placed on existing hosts. Instead of building a machine image, which could take 10+ minutes, building an application package usually takes less than 5 minutes. Scheduling that application takes seconds, compared to provisioning a new machine, which can take several minutes. The full deploy process is reduced from 10+ minutes to potentially 2 or 3 minutes. Once the application is packaged, the deploy process only takes seconds. This means development teams can respond and iterate on changes more quickly. Importantly, the application package is still an immutable object, which is versioned and auditable.
Infrastructure and application immutability
A scheduler workflow still promotes immutable infrastructure, just at two levels. The first level is the machine level, which is configured with immutable machine images. The second level is the application level, which is configured with immutable application packages. The HashiCorp ecosystem of tools is an example of how to manage an application delivery workflow that promotes immutability at both the application and machine levels.
First, the base environment is provisioned with Packer and Terraform. Packer builds images for Consul and Nomad servers, which are then provisioned by Terraform to create Consul and Nomad clusters. This prepares the environment so applications can be scheduled by Nomad and discovered with Consul. Without a service discovery tool like Consul or another method of discovery, thousands of applications could be scheduled, but the location of those applications would be unknown.
Next, a pool of servers is provisioned so there can be resources for the applications to consume. Packer configures base images with common packages, a Nomad agent, and a Consul agent. Terraform provisions a pool of servers with this base image and sets up networking rules as well. The workflow is less defined for stateful services, but many companies manage storage at the machine level, rather than application level. This means Terraform provisions “storage” machines as well.
Once the hosts are provisioned, the Nomad agent on each host reports the total resources and capabilities of the host to the central Nomad servers. The Nomad servers save the global state of the cluster – available resources and capabilities of each host – to then use for scheduling decisions. The environment and Nomad servers are now ready to receive jobs and schedule applications.
For the full application development and deployment workflow, a developer first works on changes locally and once the changes are merged into the master branch, Packer creates the application package, which is then deployed by Nomad. When Nomad places the Packer-built application on the host (see next section for details on the scheduling strategy), it registers the application with the local Consul agent. Since the applications are dynamically placed, it’s essential that Consul handles service discovery. Otherwise thousands of applications could be deployed with no knowledge of their IP or port. Vault securely stores the keys and secrets used by Packer, Terraform, and Nomad to provision environments and deploy applications. Vault also stores the credentials that applications need to authenticate with each other, such as database usernames and passwords.
From a developer perspective, the main focus is getting tests to pass locally and in a CI tool. From there the deployment process can be completely automated. Operators are normally responsible for setting up that automated process — build, provision, deploy, maintain.
Declaring application requirements and deployment strategy with a scheduler
# Define a job called my-service job "my-service" { # Job should run in the US region region = "us" # Spread tasks between us-west-1 and us-east-1 datacenters = ["us-west-1", "us-east-1"] # Rolling updates should be sequential update { stagger = "30s" max_parallel = 1 } group "webs" { # Create 5 web groups count = 5 # Create a web frontend using a docker image task "frontend" { driver = "docker" config { image = "hashicorp/web-frontend" } restart { interval = "1m" attempts = 2 delay = "15s" } # Register the task with Consul for service discovery service { tags = ["prod"] port = "http" check { type = "http" path = "/health" interval = "10s" timeout = "2s" } } env{ DB_HOST = "db01.example.com" DB_USER = "web" DB_PASSWORD = "loremipsum" } # Define the resources needed to run this task resources { cpu = 500 memory = 128 network { mbits = 100 dynamic_ports = [ "http", "https", ] } } } } }
This job file declares that five instances of the web-frontend Docker container should be run across the "us-west-1" and "us-east-1" datacenters. Each task requires 500 MHz of CPU, 128 MB of memory, and 100 MBits of bandwidth. Additionally, Nomad dynamically assigns each task a port, which enables multiple instances of the same task run on the same host. If the “frontend” task above had a statically assigned port of 80, then only one instance of the task could run on a host, since there is only one port 80. By dynamically assigning ports, one instance of the “frontend” task can run on port 20100 and one instance can run on port 20101.
Since Nomad places workloads of various types across a cluster of generic hosts, the location and port of placed workloads is unknown without service discovery. Using a service discovery tool such as Consul, etcd, or Zookeeper allows workloads to be dynamically placed across a cluster and properly discovered by dependent services. The “service” block accomplishes this by registering the task as a service in Consul with the name `$(job-name)-$(task-group)-$(task-name)`. It additionally registers the dynamic port that Nomad assigns the task during placement, as well as any health checks for the service. In the above example the task would be registered as a service in Consul with the name “my-service-webs-frontend”. Of course, the service discovery integration requires that a Consul agent is running on the host that Nomad schedules tasks on. Nomad today only has first-party integration with Consul, but a future version of Nomad will expose an API for integrating custom service discovery solutions.
Deploying applications with a scheduler
The `my-service` job can be submitted to the Nomad servers from the command-line to deploy:
`nomad run my-service.hcl`
When a job is submitted to the Nomad servers, it begins the scheduling decision-making process. The process starts with applying constraints to find available resources, and then determines optimal placements using a bin-packing algorithm.
Let's see how this would work in the above example. The driver type of the “frontend” task is a Docker container, which means it requires Docker Engine on the host to be able to run. If there are 100 nodes in the Nomad cluster and 40 have Docker Engine installed, the other 60 will be immediately removed as placement options. The next filter step is determining which hosts have available resources (CPU, memory, network) to run the task. Finally, Nomad generates a list of allocation possibilities and determines the optimal option based on maximizing resource density. If one host is at 20% utilization and another is at 60% utilization, Nomad will place the task on the latter to maximize density. Because all of this information is stored locally on each Nomad server, the scheduling process is extremely fast. No network round-trips are required, which drastically reduces scheduling speed.
Additionally, Nomad servers can be run in parallel to increase scheduling throughput and provide high-availability. When running Nomad in high-availability mode, the Nomad servers replicate state, participate in scheduling decisions, and perform leader election. The leader is responsible for processing all queries and transactions. Nomad is optimistically concurrent, meaning all servers participate in making scheduling decisions in parallel. The leader provides the additional coordination necessary to do this safely and to ensure clients are not oversubscribed. Since Nomad is optimistically concurrent, running three Nomad servers increases scheduling throughput by roughly three times. Nomad servers will find resources on a host and optimistically assume those resources are still available. If two Nomad servers try to place conflicting workloads on the same host, the task placement that occurs second will fail and be placed back in the scheduling queue. Since Nomad makes scheduling decisions in the sub-second timeframe, being placed back into the scheduling queue does not cause a significant delay in the task being placed.
If the task is placed successfully, but fails for a different reason (such as the Docker container being unreachable), the local Nomad client will restart the task. In the above example the “frontend” task will attempt two restarts per minute, then wait 15 seconds until starting another restart cycle. The Nomad client reports the health of each task it is running to the central Nomad servers, so the servers can be aware of the health status of all tasks across the fleet. If the local Nomad client can no longer restart the task, the central Nomad servers will place the task on a new host.
The road to modern ops and application delivery done right
Application delivery done right is a complex process. Splitting the workflow into two immutable layers, application and infrastructure, reduces complexity while increasing deployment speeds and resource density. The infrastructure layer is defined through building machine images and provisioning servers with those images. Both of these components can be defined as code, with Packer templates and Terraform configurations for example. The application layer is defined through application packages such as Docker containers or static binaries, which are then deployed with a scheduler or a tool for static assignment. These components can also be defined as code, with Packer templates and Nomad Job files for example.
Service discovery unites the application and infrastructure layers, with Consul for example. A Consul agent lives on each host in the infrastructure layer. Scheduled applications are registered with the local agent so the application can be discovered. For example, if 200 `api` tasks are dynamically placed, it’s essential to be able to discover each instance of the task. The IP address of the host where the task is placed and the dynamically assigned port are registered with the local Consul agent. To load balance traffic across all 200 task instances, an HA Proxy configuration can be populated with the Consul registry data:
Here is an example Consul Template which populates an HA Proxy configuration with data from the Consul registry:
backend webs balance roundrobin mode http{{range service "api"}} server {{.Node}} {{.Address}}:{{.Port}}{{end}}
Below is an example of a rendered HA Proxy configuration:
backend webs balance roundrobin mode http 104.131.109.224:6379 104.131.109.224:7812 104.131.59.59:6379 …
As shown above, multiple instances of the same task can be placed on one host (same IP) since Nomad detects port collisions and dynamically assigns ports to avoid collisions. Service discovery enables schedulers to truly maximize resource utilization, while maintaining connectivity between services.
Breaking the problem space down into two layers, application and infrastructure, and further separating those into specific components for provisioning, configuration, service discovery, security, and scheduling allows organizations to gradually ramp in complexity to reach the desired application delivery workflow.
About the Author
Kevin Fishner is the Director of Customer Success at HashiCorp. He has extensive experience working with customers across HashiCorp's open source and commercial products. Philosopher by education (Duke), engineer by trade. @KFishner