AWS Adds Intelligent-Tiering and Replication for S3 Tables

AWS recently announced two new capabilities for S3 Tables: a new intelligent tiering storage class that automatically optimizes costs based on access patterns, and replication support that automatically maintains consistent Apache Iceberg table replicas across AWS regions and accounts without manual sync.

With new intelligent tiering, storage class data will automatically be tiered to the most cost-effective of the three low-latency tiers: Frequent Access, Infrequent Access, or Archive Instant Access. The latter is the lowest-cost tier, according to the company, 68% lower than Infrequent Access. Sebastian Stromacq, a principal developer advocate at AWS, writes:

After 30 days without access, data moves to Infrequent Access, and after 90 days, it moves to Archive Instant Access. This happens without changes to your applications or impact on performance.

By default, tables use the standard storage class; however, when a user creates a table, the option to specify Intelligent-Tiering as the storage class is available, or the user can rely on the default storage class configured at the table bucket level. Users can set Intelligent-Tiering as the default storage class for your table bucket, automatically storing tables in Intelligent-Tiering when no storage class is specified during table creation.

Users can leverage the AWS Command Line Interface (AWS CLI) and the put-table-bucket-storage-class and get-table-bucket-storage-class commands to change or verify the storage tier of their S3 table bucket. The command could look like:

aws s3tables put-table-bucket-storage-class \
   --table-bucket-arn $TABLE_BUCKET_ARN  \
   --storage-class-configuration storageClass=INTELLIGENT_TIERING

# Verify the storage class
aws s3tables get-table-bucket-storage-class \
   --table-bucket-arn $TABLE_BUCKET_ARN  \

{ "storageClassConfiguration":
   {
      "storageClass": "INTELLIGENT_TIERING"
   }
}

Adefemi Adeyemi, an AWS architect at Imperious Enterprise, noted in a LinkedIn post:

Most analytics datasets are "hot" for a while, then slowly cool off. With Intelligent Tiering on S3 Tables, you no longer need to tweak lifecycle policies for your Iceberg data constantly. The service automatically moves objects to cheaper tiers based on access patterns, which can be a big win for long-lived data lakes.

With S3 Tables, replication support helps users to maintain consistent read replicas of their tables across AWS regions and accounts. When they specify the destination table bucket, the service will create read-only replica tables. It replicates all updates in chronological order while preserving parent-child snapshot relationships. Moreover, replica tables are updated within minutes of source table updates and support independent encryption and retention policies from their source tables.

Stromacq states:

Replica tables can be queried using Amazon SageMaker Unified Studio or any Iceberg-compatible engine, including DuckDB, PyIceberg, Apache Spark, and Trino.

Using the AWS Management Console, APIs, or AWS SDKs, users can create and maintain table replicas. Furthermore, they can specify destination table buckets for replicating source tables. When users enable replication, S3 Tables creates read-only replicas in these buckets, backfills them with the latest state, and continuously monitors for updates to keep them in sync.

In the same LinkedIn post, Adeyemi noted:

Native replication support lets you spin up read-only replicas that stay in sync within minutes and are still queryable as Iceberg tables. Less custom plumbing, more time actually using the data.

Lastly, users can track their storage usage by access tier through AWS Cost and Usage Reports and Amazon CloudWatch metrics. There are no additional charges to configure Intelligent-Tiering; users only pay for storage costs in each tier. As for the replication of S3 Tables, users only pay S3 Table storage charges for the destination table, for replication PUT requests, for table updates (commits), and for object monitoring of the replicated data. More details are available on the pricing page.

About the Author

Steef-Jan Wiggers

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter