Introduction to EC2 Container Service
EC2 Container Service (ECS) is a new service from Amazon Web Services (AWS).
ECS is aimed at making it easier to work with Docker containers, providing a clustering and orchestration layer for controlling the deployment of your containers onto hosts, and the subsequent management of the containers lifecycle within a cluster.
ECS is an alternative to tools such as Docker Swarm, Kubernetes or Mesos. It operates at the same layer, but is provided as a service. The difference is that whereas you need to setup and administer those tools yourself, ECS provides it for you ‘as a service’.
ECS is based on a proprietary clustering technology rather than leveraging another engine such as Docker Swarm, Kubernetes or Mesos. This is in contrast to Google's Container Engine which is an equivalent to ECS but based on Kubernetes behind the scenes.
Why Do We Need Container Orchestration?
This orchestration layer for containers provided by ECS, Swarm or Kubernetes is an important piece in the puzzle of deploying and running container based applications.
Firstly, we need to cluster our containers for scalability. As our workloads grow, we need to add more containers and scale them out horizontally across servers to process more of the workload in parallel.
Secondly, we need to cluster containers for robustness and resilience. When a host or container fails, we want the container to be re-created, perhaps on another healthy host, so the system is not impacted.
Finally, tools in the orchestration layer provide an important function of abstracting developers away from underlying machines. In a containerised world, we shouldn't need to care about individual hosts, only that our desired numbers of containers are up and running ‘somewhere appropriate’. Orchestration and clustering tools do this for us, allowing us to simply deploy the container to the cluster, and let the supporting software work out the optimal scheduling of containers onto hosts.
Designing robust and performant distributed clustering systems is notoriously difficult, so tools such as Kubernetes and Swarm give us that capability without having to build it ourselves. ECS takes this one step further by taking away the need to setup, run and administer the orchestration layer. For this reason, ECS is definitely something developers working on applications using containers in the cloud should be looking at closely.
ECS isn't a black box service. It runs on your own EC2 server instances which you can SSH into and manage as you would any other EC2 server.
The EC2 servers in your cluster run an ECS agent, which is a simple process which connects from the host into the centralised ECS service. The ECS agent is responsible for registering the host with the ECS service, and handling incoming requests for container deployments or lifecycle events such as requests to start or stop the container. Incidentally, the golang code the for ECS agent is available as open source .
When creating new servers, we can either configure the ECS agent instance manually, or use a pre-built AMI which already has it configured.
Through Amazon's CTO Werner Vogels blog post, we are told that the centralised service has a logical separation between the cluster manager and the scheduling engine which is controlling the deployments of containers onto hosts. The motivation behind this was to make the scheduling of containers pluggable, so we could eventually use other schedulers such as Mesos or even custom developed container schedulers. Custom schedulers are under-documented at the time of writing, but this blog post and accompanying source code is the best reference point right now.
The diagram also demonstrates the logical layout of the ECS cluster:container instances contain multiple tasks, tasks contain multiple containers, the cluster of EC2 container instances can span multiple availability zones, Elastic Load Balancers can be used to dynamically distribute load across the tasks. This should become clearer as you read through the rest of this document.
Services and Tasks
Within ECS, your Docker workloads are described as tasks.
A task is essentially 1 or more container definitions, including the names of the containers you want to run (as named on the Docker Hub), and relevant port and drive volume mapping information to be applied to the container instances when they start.
When the task runs, the underlying containers start. When all of the container processes finish, the task finishes. Tasks can either be short lived or long lived in nature, e.g. providing a short event driven data processing task, or a web server process.
One thing to note architecturally is that all of the containers for a given task have to run on the same machine. If we want to co-locate containers then grouping them under the same task is the way to achieve that. If we potentially want to put services on different machines, we simply have to define multiple tasks to achieve this level of control. Initially this feels like a constraint, but it ultimately gives us a degree of control over container co-location in the same way as Kubernetes pods.
To bring this to life, in the screenshot below, we can see the definition of one particular task which has a single container which hosts the nginx web server.
Beyond tasks, services are the second most important ECS concept. A service is a request to run a specified number of instances of a given task. For instance, if we have the task to run the nginx web server container defined as per above, we might define a service which requests that 3 or more instances of the web server task are ran in the cluster.
Services are how ECS provides resilience. When the service is started, the service will monitor that the underlying task is alive and running with the correct number of instances and the correct number of underlying containers. If the task stops or becomes unresponsive or unhealthy, the service will request that more tasks are started in their place, cleaning up as necessary.
The screenshot below shows a nginx service that has been defined with 3 running tasks in the cluster. Each of those tasks are in the running state.
Deploying Your First Container
On your first access to the ECS service in the console you are presented with a simple wizard. Though it is not too onerous to configure ECS manually, it is worth running this wizard the first time as it will configure everything for you - your EC2 servers, an appropriate security group, an auto scaling group, the right AMI etc which includes the appropriately configured ECS agent. It is the fastest way to get up and running and gain experience with ECS.
Step 1 - Define The Task
The first thing we need to do as part of the wizard is to define the task. For the purposes of this demonstration, we will use the freely available NGINX Docker image. (NGINX is an open source web server. This has been Dockerised by the community and uploaded to the hub.)
Start by giving the container a name such as nginx-task.
Next, click Add Container Definition, and define the nginx container. The main thing to note here is the image name which refers to the name of a public image on the Docker Hub (nginx). It is also possible to refer to private images.
The memory field is the maximum memory in megabytes that will be allocated to the running container. The CPU units is an abstract figure which allocates CPU capacity from a set of 1024 units per CPU core.
This information is incredibly useful as it adds a degree of dynamism and intelligence to the container scheduling. ECS will observe which boxes have free capacity and allocate containers intelligently in order to get the most efficient use from the underlying servers.
Step 2 - Define The Service
Next up, we need to define the service which describes how many instances of this task we want to run in the cluster.
Select the radio button to create a service, name the service nginx-service for example and set the desired number of tasks to 3. This means that when running, this service will create 3 tasks, each with 1 distinct instance of the nginx container underneath.
In a more complex setup, you can select an Elastic Load Balancer (ELB) and dynamically register your services with the ELB when they are instantiated and move around the cluster. This is described in more detail below.
Step 3 – Create ECS Cluster
Next, we need to create the cluster of EC2 machines which will run the containers. Three t2.micro instances are fine for the purpose of this demonstration. This means that 1 task and 1 container will be distributed onto each of the 3 machines. We can of course have more instances than tasks in the cluster, and use those servers for running different tasks. It is not currently possible to run multiple instances of a given task on the same server.
Select your preferred key-pair, and also follow the button to create the IAM role. The IAM role is important so your cluster hosts can access the centralised ECS service.
Step 4 – Create The Stack
The final screen of the wizard is a summary of the task, service and cluster configuration.
The same JSON code displayed could have been generated and pushed through the CLI for those who prefer to work at the command line or who wish to automate the creation of their clusters.
When you progress, you will see the requested stack being built using cloud formation. Building the stack takes 2 or 3 minutes.
Step 5 – Review The Stack and the NGINX Service
If you visit the EC2 console, you will be able to see that the underlying machines have been created and have hopefully entered running state. The wizard has created machines across availability zones to demonstrate resilience benefits.
Moving back into ECS, you will be able to review the service and hopefully see that it has reached steady state, with 3 running tasks.
Note that it can take a few minutes for all of the instances to be created, for containers to be pulled from the hub and to start, and for the service to reach steady state, so don’t worry if this process takes a while.
Drill down through the service into one of the tasks, and you will see that the task is in RUNNING state.
Expand the nginx-container. Under the External Link, you will see an HTTP link directly to the container within the task.
Clicking that link should serve the welcome page out of the NGINX container.
At this point, we have now deployed the NGINX container into ECS and accessed it via a web browser. You can consider the pipes cleaned and the concept proven.
After standing up a simple container, you will now want to leverage some more advanced setups in order to put an application into production.
ELB Load Balancing
In the example above, we connected directly into one of the three containers. This is not very robust as the container could theoretically die and be re-spawned on a different server, meaning the container specific IP address becomes invalid.
Instead, we can register our services dynamically with EC2 Elastic Load Balancers (ELBs). As the underlying tasks start and stop and move around the pool of EC2 instances, the ELB is kept up to date via the service so traffic is routed accordingly.
To configure load balancing, we first need to create an ELB through the EC2 console. Then recreate the service, wiring up the ELB at service creation time, as shown in the screenshot below.
ECS integrates with EC2 Autoscaling as well, and is currently the preferred way for growing the cluster as it comes under load.
Autoscaling works by monitoring metrics such as CPU, memory and IO, and adding nodes into or removing nodes from the pool as certain conditions are breached.
New nodes that are instantiated will automatically register with the ECS cluster and will then be eligible for future container deployments.
This is useful, but the ECS solution does not yet have hooks to scale the number of tasks or containers alongside the cluster growth. We would begin to benefit from the newly sized cluster when new containers are started, which we can do via the GUI or via the API to introduce new containers and begin to distribute load across the bigger cluster.
When defining containers within your task, it is possible to link them together using Docker's native container links.
This erases the need for static port mappings or service discovery in a multi-container environment, making the deployment of distributed microservices much easier.
Though the walk-through above was UI console based, ECS is fully integrated into the AWS CLI.
In case of problems, you can access the cluster nodes directly via SSH for debugging purposes.
In order to access the nodes via SSH, you may need to open up port 22 on the security group. This isn't opened by default on the nodes created by the wizard.
Once on the server, you can access the ECS agent log files at /var/log/ecs on the cluster nodes.
You can also run the standard Docker commands, e.g. docker images or docker ps, to interrogate the state of images and containers on the server.
This article aimed to introduce ECS and provide a walk-through example for deploying your first container cluster.
ECS is a new product. It is not overly fully featured yet, but stability seems good. We have created 100+ node clusters in our test environments, experimented with fail-over of containers and nodes, tested auto-scaling and load balancing and the service stands up very well. We now hope to take ECS into production for a number of clients.
ECS and its equivalent Google Container Engine are very important components of the container ecosystem. Developing code and deploying within containers is easy, whereas running an orchestration layer such as Kubernetes or Mesos is a step up in sophistication for the average shop. ECS gives a simple, accessible, stable, PaaS like platform for containers, and we find this hugely exciting, even at this relatively early stage in its evolution.
About The Author
Benjamin Wootton is the co-founder and Principal Consultant at Contino, a UK based consultancy who help organisations adopt DevOps and Continuous Delivery tools, practices and approaches.