BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Amazon S3 Supports New Checksum Algorithms for Integrity Checking

Amazon S3 Supports New Checksum Algorithms for Integrity Checking

This item in japanese

Bookmarks

Amazon S3 recently introduced support of four checksum algorithms for data integrity checking on upload and download requests. Amazon claims that the enhancements to the AWS SDK and S3 API accelerates integrity checking of the S3 requests by up to 90%.

Depending on application needs, developers can choose a SHA-1, SHA-256, CRC32, or CRC32C checksum algorithm, and verify the integrity of the data providing a precalculated checksum or having the AWS SDK automatically calculate a checksum as it streams data into S3.

The checksum and the specified algorithm are stored as part of the object’s metadata, are persistent even if the object changes storage classes and are copied as part of S3 replication. Jeff Barr, vice president and chief evangelist at AWS, explains the benefits of the new feature for uploads:

Computing checksums for large (multi-GB or even multi-TB) objects can be computationally intensive, and can lead to bottlenecks. The newest versions of the AWS SDKs compute the specified checksum as part of the upload, and include it in an HTTP trailer at the conclusion of the upload. (...) S3 will verify the checksum and accept the operation if the value in the request matches the one computed by S3. In combination with the use of HTTP trailers, this feature can greatly accelerate client-side integrity checking.

Source: https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/

S3 API can calculate and store part-level checksum for objects uploaded through S3 multipart upload. Previously, AWS suggested using the Content-MD5 header to check the integrity of an object. Peter Mescalchin, software engineer at Flip, tweets:

Pretty excited to see S3 can now provide alternative checksum algorithms for uploaded objects. I'm thinking pieces are close to being in place to use SHA-256 with Terraform S3 -> Lambda function deployments and source code hashes.

Aaron Booth, freelance consultant, asks:

What’s the use case of giving the user four different options then a choice of two such as the most performant or most accurate?

Kevin Miller, vice president & GM for S3 at AWS, explains:

Compatibility with other applications that use one algorithm or another. In our experience, it’s best when you can have an end-to-end checksum from the data origin to the final point of consumption, so we built support for the most popular (and will add more by customer demand!)

The additional checksums are available in all AWS regions and there is no extra cost associated with the feature.

About the Author

Rate this Article

Adoption
Style

BT