After a virtual-only event in 2020 and a reduced-size 2021 edition, re:Invent was back last week in Las Vegas with over 50,000 attendees for the 11th edition. During multiple sessions and keynotes at the largest AWS conference, the cloud provider announced new services and features, with the focus more on business solutions and data options than new building blocks.
Below is a review of the main announcements impacting computing, database, storage, networking, machine learning, and development.
Compute
Last year compute was a central topic of the conference with the preview of the Graviton 3 instances based on energy-efficient Arm-based chips. No breakthrough this year, but new instances classes C7g and Hpc7g using the Graviton 3E processor were announced. With up to 200 Gbps of network bandwidth, the C7gn Instances are designed for network-intensive workloads and are generally available. Expected in early 2023, the Hpc7g instances target intensive HPC and distributed computing workloads.
On the Intel side, the new general purpose (M6in/M6idn), compute-optimized (C6in), and memory-optimized (R6in/R6idn) instances are all powered by Intel Xeon Scalable processors (Ice Lake). Jeff Barr, vice president and chief evangelist at AWS, explains:
Prior to today's launch, you could choose a c5n, m5n, or r5n instance to get the highest network bandwidth on an EC2 instance, or an r5b instance to have access to the highest EBS IOPS performance and high EBS bandwidth. Now, customers who need high networking or EBS performance can choose from a full portfolio of instances with different memory to vCPU ratio and instance storage options available, by selecting one of c6in, m6in, m6idn, r6in, or r6idn instances.
Powered by Inferentia2 accelerators, the Inf2 Instances (now in preview) are designed for deep learning inference applications. High performance computing saw the introduction of Hpc6id instances, EC2 built for tightly coupled HPC workloads.
Elastic Network Adapter (ENA) Express Improves network latency and per-flow performance on EC2, targeting workloads that require large flows and are sensitive to variance in latency, such as distributed storage systems and live media encoding.
As reported separately on InfoQ, Lambda SnapStart is now available for Java. It has been one of the announcements better received by developers as it addresses one of the limitations of implementing serverless Java applications.
The new SimSpace Weaver service targets a niche market that runs large-scale spatial simulations at scale, avoiding that a developer is limited by the computing and memory of the hardware. Marcia Villalba, principal developer advocate at AWS, explains:
Use SimSpace Weaver when you need to increase the scale or complexity of your simulations. SimSpace Weaver is great at simulating crowds. This is very useful, for example, when you are planning large events or planning to build infrastructure like a new stadium. It is also ideal for simulating smart cities, complete with vehicles, inhabitants, and other objects.
Source: https://aws.amazon.com/blogs/aws/new-aws-simspace-weaver-build-large-scale-spatial-simulations-in-the-cloud/
Storage
There were no cheaper S3 object classes or new FS services this year. To simplify running workloads and applications on the elastic file storage and address spiky throughputs, AWS introduced EFS Elastic Throughput. Failover controls are now available for S3 Multi-Region Access Points, letting developers shift S3 data access request traffic to an alternate region to test and build highly available applications.
The support in AWS Backup of CloudFormation stacks and data warehouse Redshift was introduced during the conference.
Data
Data was one of the core themes of the conference, with different announcements about databases, analytics, and data engineering.
At the beginning of the conference, AWS announced the general availability of RDS Blue/Green Deployments, a new feature for Aurora with MySQL compatibility, RDS for MySQL, and RDS for MariaDB to perform blue/green database updates. The cloud provider introduced as well RDS Optimized Reads to provide higher write throughput on memory-optimized R6i and R5b instances and RDS Optimized Reads to improve query performances on M5d, R5d, M6gd, and R6gd instances. Not every announcement was MySQL-related: Trusted Language Extensions for PostgreSQL provides database administrators control over who can install extensions and a permissions model for running them.
The Redshift team announced several features to simplify data ingestion and retrieve insights quickly. With streaming ingestion for Kinesis Data Streams and managed streaming for Apache Kafka, Redshift can now ingest hundreds of megabytes of data per second into a materialized view and query it in seconds. It is possible to build and run Spark applications on Redshift and Redshift Serverless. Enabling near real-time analytics and machine learning, Aurora's "zero-ETL" integration with Redshift is now available in preview.
Athena for Apache Spark is an addition to run Apache Spark workloads using Jupyter Notebook as the interface to perform data processing, and programmatically interacting with Spark applications using Athena APIs.
Selipsky announced the general availability of Omics, a managed service for the storage, analysis, and elaboration of genomic, transcriptomic, and other omics data. The service is designed for healthcare and life science organizations to enhance patient care and advance scientific research.
Source: https://aws.amazon.com/blogs/aws/introducing-amazon-omics-a-purpose-built-service-to-store-query-and-analyze-genomic-and-biological-data-at-scale/
One of the main announcements of Sivasubramanian's keynote, DocumentDB Elastic Clusters is a service that manages the underlying infrastructure and elasticity for MongoDB workloads.
Controversially priced, OpenSearch Serverless manages the provisioning and scaling of the resources to deliver data ingestion and query responses using ElastingSearch-compatible APIs. Not everybody is convinced that naming every autoscaling service "serverless" is a good idea. In a recent "not so serverless Neptune" article, Jeremy Daly, CEO at Ampt, writes:
So, have we strayed so far from the purest definition of serverless that there is no going back? Or is this just what "serverless" is now? I hate to be the bearer of bad news, but somewhere along the way our compass broke, and we strayed quite a ways off the path to the promised land.
Networking
AWS announced the preview of VPC Lattice, an application layer networking service to enable cross-account, cross-VPC connectivity, and application layer load balancing. VPC Lattice handles workloads regardless of the underlying compute type: instances, containers, and serverless.
Designed to accelerate application development by decoupling authorization from business logic, Verified Permissions is now in preview and controls fine-grained permissions and authorization within custom applications. In preview, Glue Data Quality analyzes tables and automatically recommends a set of rules based on what it finds.
Machine Learning
Swami Sivasubramanian's keynote focused on machine learning, with new services and features for the managed service SageMaker. To build, train, and deploy models using geospatial data, there is the preview release of SageMaker‘s geospatial capabilities. The new geospatial image can be used to transform and visualize data inside geospatial notebooks using open-source libraries such as NumPy, GDAL, GeoPandas, and Rasterio.
New machine learning governance tools were announced, improving access control and transparency for ML projects. Antje Barth, principal developer advocate at AWS, explains:
As companies increasingly adopt ML for their business applications, they are looking for ways to improve governance of their ML projects with simplified access control and enhanced visibility across the ML lifecycle. A common challenge in that effort is managing the right set of user permissions across different groups and ML activities.
It is now possible to share artifacts, such as models and notebooks, with other users using SageMaker JumpStart. To increase efficiency across the development workflow, a newer version of SageMaker Notebooks was introduced together with support for shadow testing. Barth writes:
Shadow testing helps you build further confidence in your model and catch potential configuration errors and performance issues before they impact end users. Once you complete a shadow test, you can use the deployment guardrails for SageMaker inference endpoints to safely update your model in production.
Glue for Ray helps data engineers process large datasets with Python and Python libraries. Olalekan Elesin, director of data platform at HRS Group and AWS Machine Learning Hero, wrote an article to demonstrate how to use the new service to train a Facebook Prophet forecast model in a distributed environment.
Covered separately on InfoQ, DataZone is a management service to share, search, and discover data at scale across organizational boundaries. All the data in DataZone is governed by access and use policies that the organization can define.
The cloud provider introduced new ML-powered functionalities for the cloud contact center service Amazon Connect including forecasting, capacity planning, and scheduling.
Business applications
Now in preview, AWS Supply Chain is an ML-powered supply chain application that increases visibility and helps make decisions with actionable insights, for example mitigating overstock and stock-out risks. Corey Quinn, chief cloud economist at The Duckbill Group, tweets:
A giant supply chain concern for many companies is "Amazon Themselves." I firmly believe with full faith that AWS is a good steward of customer data and won't misuse it. But I also think that anyone who thinks that "I'll give my supply chain data to Amazon" might be a risk is absolutely raising a terrific question. I am really struggling to imagine a retailer, any retailer, to whom I could suggest this service (..)
Announced in preview a few months ago, Wickr is now generally available and provides an end-to-end encrypted communication service for enterprises with auditing and regulatory requirements.
Monitoring and Security
It is now possible to work cross-account in CloudWatch for metrics, logs, and traces, but no option is yet available to work cross-region. Monitoring data about internet traffic before it reaches the application layer, Internet Monitor uses the connectivity data that AWS captures from their global networking to determine a baseline of performance and availability. Sébastien Stormacq, principal developer advocate at AWS, explains:
How many times have you had monitoring dashboards show you a normal situation, and at the same time, you have received customer tickets reporting your app is "slow" or unavailable to them? How much time did it take to diagnose these customer reports?
CloudWatch Logs data protection is a new set of capabilities that leverage pattern matching to protect sensitive log data. Quinn comments:
At 0.12 USD per GB scanned, this is a) clearly an ML-powered service and b) not for casual users but rather for large enterprises focused on mitigating potential risk factors. This is fine! Just, y'know. Remember that just like the (not kidding) 14 USD bottle of water in the minibar, not every AWS service is necessarily priced for you.
Companies that have a regulatory need to store their encryption keys outside of the AWS data centers can now use KMS External Key Store (XKS) to manage keys on an HSM that operates on-premises.
At the end of his keynote, Selipsky announced Clean Rooms. With a questionable name for a company that recently acquired iRobot, the new service (in preview) helps collaborate with other companies on AWS without sharing or revealing underlying data.
Source: https://aws.amazon.com/clean-rooms/
Amazon Inspector now supports Lambda functions to identify software vulnerabilities in package dependencies.
Verified Access provides an enterprise connectivity service to enable local or remote secure access for corporate applications without depending on a VPN. A new option to automatically centralizes security data from cloud and on-premises sources into a purpose-built data lake, Security Lake is now in preview. The service helps identify potential security threats and vulnerabilities, centralizing logs for easy access and use within analytics tools. For more details, see a separate article on InfoQ.
A concern about the new security options and features was the already large number of services the cloud provider offers.
Architecture, Coding, and Simulations
Under the title "the world is asynchronous", Werner Vogels' keynote focused on asynchronous and event-driven architectures, with multiple announcements related to serverless technologies and development. Talking about asynchronous operations, Amazon's CTO referred to the Distributed Computing Manifesto, a document from the early days of Amazon.
Vogels announced a distributed map for Step Functions, a solution for large-scale parallel data processing, coordinating workloads within serverless applications. Stormacq highlights:
Step Functions distributed map supports a maximum concurrency of up to 10,000 executions in parallel, which is well above the concurrency supported by many other AWS services. You can use the maximum concurrency feature of the distributed map to ensure that you do not exceed the concurrency of a downstream service.
Application Composer, a visual designer to build serverless applications from multiple AWS services, is now in preview. Goran Opacic, AWS Data Hero, tweets:
The educational effects of Application Composer will be enormous.
Source: https://aws.amazon.com/blogs/compute/visualize-and-create-your-serverless-workloads-with-aws-application-composer
One of the major announcements of the conference was CodeCatalyst, a unified software development, and delivery service. Currently available in preview, CodeCatalyst provides blueprints that set up the project's resources, issue management, cloud-based development environments, and automated build and release (CI/CD) pipelines.
James Governor, RedMonk analyst and co-founder, describes 2023 as the year of the Cloud Development Environments and comments:
So CodeCatalyst looks like a packaging exercise, that taps into modern developer tools, platforms and workflows. packaging wins. convenience is the killer app. it is going to be really interesting to learn more.
Matthew Bonig, chief cloud architect at Defiance Digital, questions:
Can anyone save me some time? What existing services does CodeCatalyst replace?
EventBridge Pipes is a new feature of the serverless event router EventBridge that allows the creation of point-to-point integrations between event producers and consumers. Still in preview, the ML-powered "coding companion" CodeWhisperer received some updates too.
Feedback and Recaps
During the event, AWS updated an article with the main announcements of re:Invent 2022.
Sustainability was one of the main topics of the conference: AWS plans to power operations with 100% renewable energy by 2025 and be water positive by 2030. Aerin Booth, cloud sustainably advocate, tweets:
I am in my hotel room recovering from re:Invent and I am still struck by the irony of Adam Selipsky holding a tech conference in Las Vegas, a city known for its excess and waste. Strap yourself in to hear about the challenges of trying to be sustainable at a conference in Sin City.
There were no significant price changes revealed during the conference, in a week that saw Google Cloud announcing upcoming pricing increases for Cloud Storage.
What were your favorite announcements? Let us know in the comments below.