Amazon EC2 Introduces Automatic Recovery of Instances by Default

Amazon recently announced that EC2 instances will now automatically recover in case they become unreachable due to underlying hardware issues. Automatic recovery migrates the instance to a different hardware while retaining instance ID, private IP addresses, Elastic IP address, and metadata.

There are different scenarios that can trigger instance recovery: loss of network connectivity, loss of system power and software or hardware issues on the physical host. During an instance recovery, the instance is migrated as part of an instance reboot, and any data that is in-memory is lost. If the impaired instance has a public IPv4 address, the instance retains it. Similarly, if it is in a placement group, the recovered instance will run in the same one.

Mani Chandrasekaran, principal solutions architect at AWS, writes:

This may be a small announcement but will have a significant impact on our customers in a positive way. This was available earlier too, but "by default" is the keyword here.

A first iteration of the feature was released years ago and it was also possible to recover an instance automatically setting up a CloudWatch alarm. In a popular thread on Reddit, user bruin116 writes:

Finally! Really happy to see this. It's something Azure has done automatically since 2015 and I always thought it was a strange omission that AWS did not.

The new behavior is designed to speed up the recovery of an instance when impacted by a hardware failure but customers can choose to disable it. Auto recovery is not initiated if an instance is part of an Auto Scaling group with health checks enabled and the instance is replaced using the native feature of autoscaling.

The cloud provider highlights that the automatic recovery of an instance is not intended as a high availability solution as it might fail if there is insufficient capacity in the availability zone, if the instance has already reached three daily recovery attempts or if there are issues that impact the underlying recovery process.

Amazon updated the documentation to cover some initial questions, but doubts remain. User MowglisDad comments:

Per the updated documentation, a new Cloudwatch event has been added that can be used to provide custom handling of recovery. The open question is whether subscribing to it for informational purposes will override default behavior.

Not every EC2 instance class currently supports automatic recovery as well as instances with an EFA network interface, instances with instance store volumes and metal instance types. It is possible to verify the running instances that support simplified automatic recovery using the CLI describe-instance-types command:

    aws ec2 describe-instance-types --filters Name=auto-recovery-supported,Values=true  
    --query "InstanceTypes[*].[InstanceType]" --output text | sort

The announcement of the new feature does not guarantee or even suggest the time required for an auto-recovered server to resume operations.

About the Author

Renato Losio

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Renato Losio

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter