The AWS Well-Architected Framework Adds Operational Excellence

| by Abel Avram Follow 5 Followers on Nov 25, 2016. Estimated reading time: 2 minutes |

A note to our readers: You asked so we have developed a set of features that allow you to reduce the noise: you can get email and web notifications for topics you are interested in. Learn more about our new features.

Amazon has updated their AWS Well-Architected Framework (PDF) based on feedback from clients, adding a new pillar, Operational Excellence.

The AWS Well-Architected Framework contains a set of best practices for building and operating secure, efficient and cost effective systems in the cloud. The architectural guidelines were put together by Amazon for AWS customers, but they are generally useful for any cloud platform.

The framework was first published a year ago and now it has been updated including feedback from customers and lessons learned using it. For those not familiar with the framework, we recommend reading the initial InfoQ article because in this post we will mention only some of the notable changes introduced in this year’s version.

Besides the four original pillars – Security, Reliability, Efficiency, and Cost Optimization – the AWS team of architects has introduced the fifth one: Operational Excellence, which represents the “ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.“ The best practices recommended to ensure operational excellence of production workloads are:

  • Perform operations with code: automate operations as much as possible.
  • Align operations processes to business objectives: collect only those metrics that support business needs, responding appropriately to operational events.
  • Make regular, small, incremental changes: workloads should consist of components that are updated regularly in small steps without taking down services, and operations should be able to roll back those changes if necessary.
  • Test for responses to unexpected events: inject failures in the system to see how it reacts to unexpected operational events. Develop clear procedures to react to such events.
  • Learn from operational events and failures: monitor and analyze how a system behaves during various operational events in order to improve it.
  • Keep operations procedures current: update procedures and guidelines to accurately reflect the current system as it evolves over time.

The Well-Architected Framework comes with a number of design principles meant to create good systems in the cloud:

  • Stop guessing your capacity needs: always use cloud’s scalability capabilities rather than guessing capacity needs and risking providing inadequate capacity. 
  • Test systems at production scale: scale up the system to what it would be in production and test it to see how it works in the real environment. Decommission the extra resources once the test is over.
  • Automate to make architectural experimentation easier: automate the entire process of creating a system, enabling it to be replicated easily. Also, returning to a previous setup is simple that way.
  • Allow for evolutionary architectures: automation enables architects to evolve systems as needed, easily testing and setting up new configurations.
  • Data-driven architectures: collect needed operational data that can be used to evaluate how architectural changes impact the workloads. The data can also be used to tune up the automation code.
  • Improve through game days: inject failures to simulate operational events in production to understand how the system behaves when they take place and correct it if necessarily.

The framework also includes questions and answers for all five pillars on which it is built, providing guidance on how to address practical issues such as protecting against unauthorized use of the AWS root account, planning network topology, responding to unplanned operational events, and many others. We recommend reading the paper (AWS Well-Architected Framework) for an in-depth view of what it takes to create a successful system in the cloud.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread


Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you