Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Werner Vogels on “21st Century [Cloud] Architectures”: Availability, Reliability and Resilience

Werner Vogels on “21st Century [Cloud] Architectures”: Availability, Reliability and Resilience

At the AWS re:invent 2017 conference, Werner Vogels, CTO of Amazon, presented a keynote that discussed core concepts required for building "21st Century Architectures" on the cloud. Highlights of the talk included discussion of the emerging practices of evolutionary and "cloud native" architectures, the role of security becoming everyone's responsibility, and the benefits of chaos engineering.

Vogels began the keynote by explaining that the key technology drivers of today are data, Internet of Things (IoT), Graphical Processing Unit (GPU)-based computing for machine learning (as typified by AWS's EC2 P3 instances), and deep learning. The harnessing of these trends has led to a series of innovations within society -- e.g. the adoption of "big data" within business for analysis, and the creation of "smart environments" -- but the corresponding digital input and output devices we use to interface with these systems has not changed much over the last twenty years. Vogels believes that the next evolution of technology will focus on making digital access human-centric, in particular "voice unlocks digital systems for everyone".

voice is the next disruption

Next, Vogels shifted gears, and began discussing the need for effective architecture designs that power the technology and data processing systems behind the interfaces. A series of example cloud architectures were discussed, and key topics -- or pillars -- from the AWS "Well Architected Framework" white paper presented:

  • Operational excellence
  • Security
  • Reliability
  • Performance efficiency
  • Cost optimisation

When designing systems to run on the cloud a series of principles should be followed: stop guessing capacity needs; test systems at production scale; automate to make architectural experimentation easier; allow for evolutionary architectures; drive your architecture using data; and improve through game days.

For every two orders of magnitude increase in a system's users, you will most likely need to change the architecture fundamentally.

Vogels stated strongly that now everyone is responsible for building secure systems. This includes developers, operators, and application security and compliance teams. A series of security principles were presented, including: identity - implement a strong identity foundation; detective controls - enable traceability; infrastructure protection - apply security at all layers and automate security best practices; data protection - protect data in transit and at rest; and incident response - prepare for security events through game days.

There is no excuse not to encrypt data any more. At the minimum, encrypt [personally identifiable information] PII and create threat models…

Good security practices should be enforced with a continuous delivery build pipelines. "Control and validation" should be applied pre- and post-event within the pipeline and the system. Infrastructure as Code (IaC) should be stored within a version control system (VCS), system code should be validated as early as possible, force infrastructure changes through templates, and block an event if needed or unsure. Post-event, engineers should always follow up on access to sensitive APIs, use a single source of truth for configuration, validate the source, and decide on remediation. A series of AWS services were presented to assist with automating these processes, such as AWS CloudTrail, AWS Config Rules, and the newly released Amazon GuardDuty.


Vogels continued by stating that the core tenets of a 21st century architecture are availability, reliability and resilience. Availability is implemented by deploying systems to multiple (geographical) availability zones, deploying redundant components, implementing systems using the microservices architectural style, focusing on recovery-oriented computing, and by following distributed systems best practices. For reliability, engineers must think about designing for appropriates nines of availability, and understand hard dependencies and redundant dependencies. Resilience can be implemented by failing fast, traffic throttling, retries with exponential fallback, circuit breaking, and use of idempotency tokens and filters.

Implementing a specific availability target is a business decision. AWS provides the tools, and you determine design and cost.

Next, Vogels introduced Nora Jones, senior software engineer at Netflix, to talk about resilience and chaos engineering. When building complex distributed systems -- which the majority of organisations working with cloud technology and microservices architectures are -- unit and integration testing, although vital, are insufficient for guaranteeing resilience. Jones argued that the emerging discipline of chaos engineering is essential for surfacing inherent issues within complex systems.

At its core, chaos engineering is the practice of experimenting with causing failure within systems. Engineers formulate a hypothesis about how a system may fail, design an experiment to cause or simulate this, and execute the experiment in a controlled fashion. The results are then analysed, and the cycle of experimentation continues. Jones discussed the "forces of chaos" -- the potential evolution of resilience testing within a system -- which included graceful restarts and degradation, targeted chaos, cascading failure, and failure injection.

Netflix created a framework for Failure Injection Testing (FIT) in 2014, which has now evolved into the Chaos Automation Platform (ChAP). This platform allows engineers to run automated chaos experiments -- more details of which can be found in the InfoQ coverage of Jones' QCon SF talk on the topic. Jones concluded the talk by stating that "chaos [engineering] doesn't cause problems [in a system] - it reveals them", and encouraged the audience to explore the concepts further within a Chaos Engineering mini-book she co-authored alongside Casey Rosenthal, Lorin Hochstein and Aaron Blohowiak, and also via the website.

Chaos doesn't cause problems - it reveals them

Vogels next introduced Abby Fuller, senior technical evangelist at AWS, to talk about the role container technology has within 21st century architectures. Fuller presented a series of customer case studies in which packaging and deploying applications within containers had played a key role, including Segment and Capital One. Next, an overview was provided of the newly released AWS managed Kubernetes service -- Amazon Elastic Container Service for Kubernetes (Amazon EKS) -- and AWS Fargate -- a technology for Amazon ECS and EKS that allows you to run containers without having to manage servers or clusters. A key message from Fuller's talk was that the managed services provided by AWS allow customers to "focus only on workloads", and not the "undifferentiated heavy lifting" of managing the underlying infrastructure -- and providing these systems are architected correctly, this will allow the execution of applications in a secure, scalable and reliable fashion

Vogels concluded the talk on 21st century architectures by asking the audience to imagine what the future of software application development will look like, and proposed that soon all of the code that is being written will be exclusively business logic. Vogels mused that with the increasing adoption of "serverless" -- Function-as-a-Service and managed services -- may mean that this future will arrive sooner than many have predicted.

Further details of product releases and announcements made at AWS re:invent 2017 can be found on the accompanying InfoQ news items:

Additional information on the AWS re:invent conference can be found on the event website.

Rate this Article