AWS Introduces Amazon Glacier for Low Cost Data Archives

Amazon Glacier is a new service from Amazon Web Services (AWS) that provides extremely low cost, durable storage for archive-ready data. This service targets organizations who want to retain large, infrequently-used data sets but don’t want to maintain a local storage infrastructure.

Amazon highlights a core set of use cases for this new service including enterprise information archiving, storage of historical media assets, archival of scientific research data, and replacement of traditional magnetic tape libraries. This new service has no limits on the amount of data that can be stored, and boasts eleven 9s (99.999999999%) of durability. An “archive” can store up to 40TB of data and each archive is associated with a region-specific “vault.” The archives are encrypted while at rest and can be accessed through a REST web services interface. Storage costs vary by region, but prices begin at $0.01 per GB per month. Users can retrieve up to 5% of their data each month for free, but after that, a more complex billing calculation comes into play. Whereas storage services like Amazon S3 measure retrieval time in milliseconds, Amazon Glacier measures its retrieval time in hours. In a blog post announcing the service, AWS CTO Werner Vogels, explained that “data is retrieved by scheduling a job, which typically completes within 3 to 5 hours.”

Will this service fundamentally change how storage is approached? That’s what Jack Clark of ZDNet wonders in an article that analyzes the technology behind Amazon Glacier. Clark notes that “Glacier is tapeless and runs on inexpensive commodity hardware components” and he uses a post from Hacker News to glean more information about the technology underpinnings.

Glacier's hardware is based on a low-RPM custom hard drives built by "a major hardware manufacturer" for Amazon.

These drives are put in custom racks with custom logic boards, and only a percentage of a rack's drives can be spun at full speed at any one time due to some type of constraint within the system.

The reason for the 3-5 hour lag in accessing stored data in Glacier is that it must be taken out of these systems and moved to staging storage before it can be downloaded by the client.

Clark then infers that Amazon is able to save costs by only powering on the hardware during the data retrieval process. This limited power consumption results in lower per-GB costs that can be passed on to the consumer.

However, Wired wonders if there are some nasty surprises in store for those retrieving their data at a later time.

But the retrieval fees are confusing. According to Amazon’s pricing chart, you can request up to 5 percent of the data stored in Glacier for free each month, but it’s prorated by the day. The FAQ explains: “If on a given day you have 12 terabytes of data stored in Glacier, you can retrieve up to 20.5 gigabytes of data for free that day (12 terabytes x 5% / 30 days = 20.5 gigabytes, assuming it is a 30 day month).” Elsewhere in the FAQ it explains that this is about 0.17 percent a day (“5% / 30 days = 0.17% per day”).

It gets more convoluted if you go over that limit. “You are charged a retrieval fee when your retrievals exceed your daily allowance,” says Amazon’s FAQ. “If, during a given month, you do exceed your daily allowance, we calculate your fee based upon the peak hourly usage from the days in which you exceeded your allowance.” And it gets worse from there.

The “peak hourly usage” calculation is not publicly available, so Wired warns consumers to be cautious when retrieving large blocks of data.

The API for the service is published and both the AWS Management Console and framework-specific SDKs have been updated to include Amazon Glacier capabilities.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the AWS topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter