BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Large Scale Map-Reduce Data Processing at Quantcast

Large Scale Map-Reduce Data Processing at Quantcast

Bookmarks
58:49

Summary

Ron Bodkin presents the architecture used by Quantcast to process 100s of TB of data daily using Hadoop on dedicated systems, the applications, the type of data processed, and the infrastructure used.

Bio

Ron Bodkin is the founder of Think Big Analytics and works with Quantcast, an open ratings service for Web sites. He is also the founder of New Aspects of Software, and the leader of project Glassbox. Before that, Bodkin led the first AspectJ projects at Xerox PARC. Prior to that, Ron was a founder and the CTO of C-bridge, a consultancy that delivered enterprise applications using Java frameworks.

About the conference

QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community.QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.

Recorded at:

Dec 21, 2010

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Data corruption of Hadoop / Distributed file system

    by Tormod Varhaugvik,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It is mentioned on slide 27 that Data corruption is a major risk. Is this historically or still the situation?

  • Re: Data corruption of Hadoop / Distributed file system

    by Ron Bodkin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It's a good idea to have a backup of data in any file system. Corruption can happen because of bugs in your application or hardware issues, as well as bugs in the underlying system software. I don't think HDFS is very likely to corrupt data, but the impact can be catastrophic if you don't have a good backup strategy in place. HDFS's replication helps a lot, of course, but you can further reduce the risk.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT