Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles The Three Generations of AWS

The Three Generations of AWS

Key takeaways

  • AWS offers 3 infrastructure primitives: EC2 Instances, ECS Tasks and Lambda Functions
  • Lambda Functions are the newest technique and have attractive cost and low operations properties, but right now are not suitable for all types of apps
  • EC2 Instances support any workload but require lots of do-it-yourself expertise to support applications
  • ECS Tasks also support any workload and offloads the majority of the instance setup and operations back onto AWS
  • Focus most on building Docker Images for apps. ECS runs these best for all apps right now, and the "serverless" properties of both ECS and Lambda is an implementation detail

When building a new system on AWS we are faced with three architectural choices around application packaging, runtime service and load balancing service.

We can build Amazon Machine Images (AMIs) and run them as virtual machines in the Elastic Compute Cloud (EC2) behind a Elastic Load Balancer (ELB).

We can build Docker Images and run them as containers on the EC2 Container Service (ECS) behind an Application Load Balancer (ALB).

We can build Node, Python or Java project zip files and run them as Lambda Functions behind an API Gateway.

EC2, ECS and Lambda represent three generations of AWS services rolled out over the past decade. Instances, Tasks and Functions are the primary building blocks in these three generations respectively.

While each architecture can lead to success, and real-world systems will use some of each, ECS Tasks are the best architecture to target right now. ECS Tasks cost savings, security and speed over manually configured EC2 Instances, with none of the tough constraints of Lambda Functions.

Why Lambda Functions?

Lambda Functions are the newest AWS technology, and get a lot of buzz right now as “the future of cloud computing.” The properties of real-time function invocation, per-request billing and zero server management are impossible to beat.

The simplest way to use Lambda Functions is to add custom logic around events that happen inside the AWS platform. The canonical killer Lambda “app” is a single JavaScript function that is configured to be automatically called on every new S3 object, say to resize images.

This pattern of evented programming with Lambda enables extremely sophisticated applications. When coupled with the AWS Simple Queue Service (SQS) and/or DynamoDB, you can build robust, dynamic and cost-effective producer and consumer systems without any servers.

In 2015, AWS launched an API Gateway service which turns incoming HTTPS requests into events that trigger Lambda functions. This enables Lambda to power Internet-facing APIs with zero servers.

There is now an explosion of tooling and app development patterns evolving around building systems around Lambda Functions.

Why Not Lambda Functions?

However, the Function building block comes with strict constraints that pose tough challenges.

Right now, every Lambda Function call must complete within 5 minutes and can not use more than 5 GB of memory. Calls that violate these constraints will simply be terminated.

Similarly, the API Gateway service only supports HTTPS connections, and not HTTP or HTTP/2.

These constraints make Lambda a non-starter for the traditional business Java, PHP and Rails systems out there.

Lambda does represent the future of the cloud. AWS will continue working over the next decade to remove server management and to drastically lower our bills.

But right now, Lambda can’t fully replace the need for Instances due to its constraints. And ECS offers similar reductions in server management and cost without the tough tradeoffs.

Why Not EC2 Instances?

EC2 Instances are the oldest technology, now 10 years old, and are considered the tried-and-true form of cloud computing. The properties of API based provisioning and management, and hourly pricing are what killed the traditional hosting model.

But managing apps on EC2 Instances leaves a lot to be desired. You need to pick a server operating system, use a configuration management tool like Chef to install dependencies for your app, and use an image tool like Packer to build an AMI.

A deploy that needs to build AMIs and boot VMs takes at least a few minutes. In the age of continuous delivery we want deploys that take a few seconds.

Why EC2 Instances?

That said, we can’t use Lambda Functions for everything, so we still need to utilize Instances somehow…

EC2 Instances are extremely flexible, so numerous strategies for faster deployment times exist. A common technique is to use a tool like Ansible to SSH into every instance and pull a new version of code then restart the app server. But now we’re using bespoke scripts to mutate instances which add to the failure scenarios.

Another strategy is “blue-green deploys”. We can boot an entire new set of EC2 Instances, with the new version of the software (call this “blue”), migrate traffic over to it, then terminate the old set (“green”). This reduces failure scenarios, but doesn’t necessarily increase speed. It also requires windows of double capacity, which adds cost and may not be available during service outages.

A cutting-edge technique is to install an agent on every instance that will coordinate starting and stopping processes to do a rolling deploy. Big companies like Google and Twitter have proven this model of scheduling work across a cluster of generic instances. There are now a few open-source projects, like Docker Swarm and Apache Mesos, that “orchestrate” these fast deployments across a cluster.

Because of the speed, fixed capacity needs, success at Google, and mature open-source solutions that make it viable in any datacenter (on prem or in the cloud), container orchestration is a modern best practice.

Therefore EC2 Instances with container orchestration is the modern AWS best practice.

This explains why there is heated competition in the orchestration software space, and why AWS launched their fully-managed EC2 Container Service (ECS).

If we install Swarm or Mesos we’re responsible for operating the software. We need to keep an orchestration service up 100% of the time, so Instances can always check in and see if they need to start or stop processes. If we use ECS, we delegate that responsibility to Amazon.

Why ECS Tasks?

ECS Tasks are young technology, but container orchestration is a modern best practice of cloud computing. The properties of packaging our apps, including the operating system, into a standard Image format, then scheduling containers across a cluster of “dumb” instances is a huge efficiency improvement over older EC2 Instance strategies.

Containers are faster to build and faster to boot than Instances. A single Instance can run multiple container workloads, which offers less operating system overhead and less instances to maintain and pay for. Orchestration coordinates fast deploys and failure recovery alike.

The most important distinction about ECS is that it’s a managed service.

Amazon provides an “ECS Optimized AMI” that has the right server operating system, Docker server pre-configured. AWS is responsible for keeping the ECS APIs up and running so instances can always connect to it to ask for more work. AWS is responsible for writing the open-source ecs-agent that runs on every instance.

Because AWS built the entire stack, we can trust its quality, and trust that AWS will support it through tickets if things don’t work right.

It’s also important to understand that AWS considers Tasks a first-class primitive of the entire AWS platform.

Every individual Tasks -- every single command we run in the cluster-- has configuration options for:

  • CPU and memory limits
  • Security policies through an IAM role
  • Logging to any external syslog, fluentd system or CloudWatch Logs group
  • Registering into a load balancer

This year we saw two new services, Elastic File System (EFS) and Application Load Balancer (ALB) which are clearly designed around containerized workloads and fix major constraints with the Elastic Block Store (EBS) and Elastic Load Balancer (ELB) from the EC2 generation.

With all this, Tasks have more platform features out of the box than EC2 Instances ever did.

I expect continual platform improvements around Tasks, like improved auditing and billing over the next years. I also expect reduced effort for cluster management.

So with ECS Tasks, we’re responsible for providing an arbitrary Docker Image and Amazon is responsible for everything else to keep it running forever.

In Conclusion

EC2 Instances are too raw, requiring lots of customization to support an app. Lambda Functions are too constrained, disallowing traditional apps. ECS Tasks are just right, offering a simple way to package and run any application, while still relying on AWS to operate everything for us.

If you’re building a modern real-world system on AWS, an architecture based around ECS Tasks is the best choice.

About the Author

Noah Zoschke is the CTO of Convox, an open-source infrastructure framework. Convox implements best practices around Docker and AWS and offers simple commands to build, develop, deploy and manage apps with true development and production parity. Previously he was platform architect at Heroku. There he designed, built and managed one of the earliest and largest consumer-facing cloud container services. Follow @nzoschke and @goconvox on Twitter and check out The Convox Blog to keep in touch.

Rate this Article