InfoQ Homepage Hadoop Content on InfoQ
-
Grid Gain vs. Hadoop. Why Elephants Can't Fly
Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.
-
Distributed Data Analysis with Hadoop and R
Jonathan Seidman and Ramesh Venkataramaiah present how they run R on Hadoop in order to perform distributed analysis on large data sets, including some alternatives to their solution.
-
Panel: Hadoop for the Enterprise Architect
Peter Sirota, Amr Awadallah, Eric Baldeschwieler, Ted Dunning, Guy Bayes, and moderator Ron Bodkin discuss various existing Hadoop use cases, ecosystems, and disaster recovery.
-
NoSQL at Twitter
Ryan King presents how Twitter uses NoSQL technologies - Gizzard, Cassandra, Hadoop, Redis - to deal with increasing data amounts forcing them to scale out beyond what the traditional SQL has to offer
-
NoSQL at Twitter
Kevin Weil presents how Twitter does data analysis using Scribe for logging, base analysis with Pig/Hadoop, and specialized data analysis with HBase, Cassandra, and FlockDB.
-
Large Scale Map-Reduce Data Processing at Quantcast
Ron Bodkin presents the architecture used by Quantcast to process 100s of TB of data daily using Hadoop on dedicated systems, the applications, the type of data processed, and the infrastructure used.
-
Social Networks: Getting Distributed Web Services Done with NoSQL
Lars George and Fabrizio Schmidt present Germany’s largest social networks, Schuelervz, Studivz and Meinvz, the initial architecture, why it didn’t work and how they solved it with a NoSQL solution.
-
Horizontal Scalability via Transient, Shardable, and Share-Nothing Resources
Adam Wiggins details how memcached, CouchDB, Hadoop, Redis, Varnish, RabbitMQ, Erlang apply the transient, shardable and share-nothing principles to achieve horizontal scalability.
-
Facebook’s Petabyte Scale Data Warehouse using Hive and Hadoop
Ashish Thusoo and Namit Jain explain how Facebook manages to deal with analysis of 12 TB of compressed new data everyday with Hive’s help, an open source data warehousing framework built on Hadoop.
-
Hypertable - An Open Source, High Performance, Scalable Database
This presentation discusses Hypertable, an open source, high performance, distributed database modeled after Google's Bigtable. Doug offers a comprehensive discussion of all aspects of Hypertable.