InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Run Your Own Google Style Computing Cluster with Hadoop and Amazon EC2

Posted by Scott Delap on Nov 10, 2006

Sections
Operations & Infrastructure,
Enterprise Architecture,
Development,
Architecture & Design
Topics
Clustering & Caching ,
Grid Computing ,
Java
Tags
Amazon ,
MapReduce ,
EC2 ,
Hadoop
Clustered grid computing software does not simply happen. Efficient architectures must be designed. One of the core technologies used by Google is the MapReduce programming model which allows for the processing and generation of large data sets. By defining a scalable program structure upfront Map Reduce allows algorithms to easily scale across machines:

Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

Doug Cutting the creator of Lucene and now an employee of Yahoo has been working on an open source implementation of MapReduce and called Hadoop written in Java which also includes a distributed file system. Hadoop has already been tested on clusters up to 600 nodes.

Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

Amazon recently released their EC2 Elastic Computing cloud which allows developers to acquisition computing power a the rate of $0.10 per hour consumed. Recently work has been done to allow Hadoop to run on EC2. This combination will allow developers to write scalable algorithms and then bring up large numbers of servers for computing power which can then be then shut them down when they are not needed.

typo by anjan bacchu Posted
  1. Back to top

    typo

    by anjan bacchu

    "Recently work as been done to "

    you mean : "Recently work has been done to " ?

    BR,
    ~A

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.