InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Amazon Rolls Out Hadoop Based MapReduce to EC2

Posted by Scott Delap on Apr 02, 2009

Sections
Operations & Infrastructure,
Enterprise Architecture,
Development,
Architecture & Design
Topics
Grid Computing ,
Cloud Computing ,
Java
Tags
EC2 ,
Hadoop ,
MapReduce

There have been tutorials available for quite a while detailing how to run the popular Apache Hadoop MapReduce framework on Amazon EC2.  Today Amazon raised the bar providing official support via Amazon Elastic MapReduce.  From the product page:

Amazon Elastic MapReduce automatically spins up a Hadoop implementation of the MapReduce framework on Amazon EC2 instances, sub-dividing the data in a job flow into smaller chunks so that they can be processed (the “map” function) in parallel, and eventually recombining the processed data into the final solution (the “reduce” function). Amazon S3 serves as the source for the data being analyzed, and as the output destination for the end results.

Amazon Elastic MapReduce pricing is on top of existing EC2 charges at the rate of 15%.  The FAQ has a full list of details on pricing and usage.  The official AWS blog also provides coverage:

...Processing in Elastic MapReduce is centered around the concept of a Job Flow. Each Job Flow can contain one or more Steps. Each step inhales a bunch of data from Amazon S3, distributes it to a specified number of EC2 instances running Hadoop (spinning up the instances if necessary), does all of the work, and then writes the results back to S3. Each step must reference application- specific "mapper" and/or "reducer" code (Java JARs or scripting code for use via the Streaming model). We've also included the Aggregate Package with built-in support for a number of common operations such as Sum, Min, Max, Histogram, and Count. You can get a lot done before you even start to write code!

We're providing three distinct access routes to Elastic MapReduce. You have complete control via the Elastic MapReduce API, you can use the Elastic MapReduce command-line tools, or you can go all point-and-click with the Elastic MapReduce tab within the AWS Management Console! Let's take a look at each one...

ZDNet's Dana Gardner speculates on the implications of of Amazon's new offering for the business intelligence market.

No comments

Watch Thread Reply

Educational Content

Jesper Boeg on Priming Kanban

In this interview, Jesper Boeg, author of the new InfoQ book – Priming Kanban, discusses the keys to using Kanban effectively, and how to get started if you are currently using other approaches.

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.