InfoQ

News

Using Amazon Web Services to Implement a Video File Conversion app

Posted by Gavin Terrill on Jul 24, 2007 02:30 PM

Community
Architecture,
SOA
Topics
Design ,
Grid Computing ,
Deployment / Datacenter
Tags
Amazon ,
EC2
As covered on InfoQ in the past, Amazon's infrastructure services platform is enabling new levels of cost savings as well as capabilities for certain classes of applications that can map to its scalable compute and storage services.  One recent sample application demonstrates building a video file conversion service using three key Amazon web services: Simple Storage Service (S3), Simple Queue Service (SQS), and Elastic Compute Cloud (EC2). 

S3 is used to store the files for conversion:

"Amazon S3 is the perfect place to store the video files to be converted as well as any output files generated by our conversion service. In addition to being fast and reliable, we will never have to worry about our service running out of disk space."

To make the service scalable and highly available, the design of the service is message-driven, utilizing SQS's reliable message delivery. This ensures that execution of client requests happens in the order they are received.

The ConvertVideo service is written in Python and utilizes the boto library, which provides a set of classes for integrating with Amazon Web Services. To provision the service to EC2, an AMI (Amazon Machine Image) file needs to be created and registered so that instances may be created on demand.

On the client side, the boto library provides a command line interface that can upload a directory of files to an S3 bucket, posting a messages to an SQS queue for each file. Once the files have been uploaded, a service instance can be started to process the messages in the queue.

To test scalability, an initial conversion run is performed on 50 videos by 1 instance:

  • Average Processing Time: 17.820000
  • Elapsed Time: 896
  • Throughput: 3.348214 transactions / minute

The next conversion run is based on 500 videos, and 10 instances:

  • Average Processing Time: 17.794000
  • Elapsed Time: 928
  • Throughput: 32.327586 transactions / minute

The additional service instances have increased throughput in a linear and predictable manner:

Sure enough, the average processing time and elapsed time are almost exactly the same but our overall throughput is roughly 10 times higher than in our previous example which is exactly the sort of behavior we would expect and hope for.

The tutorial breaks down the cost of converting the 500 videos:

Storage 2.5 GBytes $0.38/Month
Transfer 2.5 GBytes $0.50
Messages 1000 $0.10
Compute Resources 8 Instances for ~ 20 minutes $0.80
  Total: $1.78

A total of about $1.78 for converting 500 videos means a per/video cost of less than $0.004.

Compute services such as file conversion seem a good fit for the AWS infrastructure, however questions have been raised on the utility of the platform without an a database. Dare Obasanjo, in his blog posting "Amazon EC2 + S3 doesn't cut it", laments the lack of a database while experimenting with a Facebook application:

"it seems supporting this fairly straightforward application is beyond the current capabilities of EC2 + S3. S3 is primarily geared towards file storage so although it makes a good choice for cheaply hosting images and CSS stylesheets, it's a not a good choice for storing relational or structured data."

Of course, Amazon has deep experience in scaling out services. In his summary of the Google Seattle Scalability Conference, Robin Harris remarks on Amazon's CTO Verner Wogels memorable line: "Databases are Dinosaurs". Perhaps Dynamo, Amazon's scalable data store and due to be presented at SOSP 2007, is the remaining missing piece of the AWS puzzle.

No comments

Watch Thread Reply

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.