InfoQ

InfoQ

Presentation

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Recorded at:
Recorded at

Large Scale Map-Reduce Data Processing at Quantcast

Presented by Ron Bodkin on Dec 21, 2010 Length 00:58:49     Download: MP3
     Slides
Sections
Operations & Infrastructure,
Architecture & Design
Topics
Big Data ,
QCon San Francisco 2010 ,
Database Design ,
QCon ,
Architecture ,
Conferences ,
Database ,
MapReduce ,
Hadoop
The next QCon is in New York June 18-22, Join us!
 

How would you like to view the presentation?

In case you are having issues watching this video, please follow these simple steps to help us investigate the issue:
1. Right click on the video player and select Copy log
2. Paste the copied information in an email to video-issue@infoq.com (clicking this link will fill in the default details in most email clients).
Note: in case your email client hasn't automatically picked up the email subject, please include in your email the URL of the video too.
3. Done.
We will investigate the issue and get back to you as soon as possible. Thanks for helping us improve our site!
Summary
Ron Bodkin presents the architecture used by Quantcast to process 100s of TB of data daily using Hadoop on dedicated systems, the applications, the type of data processed, and the infrastructure used.

Bio
Ron Bodkin is the founder of Think Big Analytics and works with Quantcast, an open ratings service for Web sites. He is also the founder of New Aspects of Software, and the leader of project Glassbox. Before that, Bodkin led the first AspectJ projects at Xerox PARC. Prior to that, Ron was a founder and the CTO of C-bridge, a consultancy that delivered enterprise applications using Java frameworks.

About the conference
QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community.QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.
  • This article is part of a featured topic series on QCon
Data corruption of Hadoop / Distributed file system by Tormod Varhaugvik Posted
Re: Data corruption of Hadoop / Distributed file system by Ron Bodkin Posted
  1. Back to top

    Data corruption of Hadoop / Distributed file system

    by Tormod Varhaugvik

    It is mentioned on slide 27 that Data corruption is a major risk. Is this historically or still the situation?

  2. Back to top

    Re: Data corruption of Hadoop / Distributed file system

    by Ron Bodkin

    It's a good idea to have a backup of data in any file system. Corruption can happen because of bugs in your application or hardware issues, as well as bugs in the underlying system software. I don't think HDFS is very likely to corrupt data, but the impact can be catastrophic if you don't have a good backup strategy in place. HDFS's replication helps a lot, of course, but you can further reduce the risk.