Ashish Thusoo presents the data scalability issues at Facebook and the data architecture evolution from EDW to Hadoop to Puma.
Kumar Palaniapan and Scott Fleming present how NetApp deals with big data using Hadoop, HBase, Flume, and Solr, collecting and analyzing TBs of log data with Think Big Analytics.
Jake Luciani introduces Brisk, a Hadoop and Hive distribution using Cassandra for core services and storage, presenting the benefits of running Hadoop in a peer-to-peer masterless architecture.
Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.
Jonathan Seidman and Ramesh Venkataramaiah present how they run R on Hadoop in order to perform distributed analysis on large data sets, including some alternatives to their solution.
Peter Sirota, Amr Awadallah, Eric Baldeschwieler, Ted Dunning, Guy Bayes, and moderator Ron Bodkin discuss various existing Hadoop use cases, ecosystems, and disaster recovery.
Ryan King presents how Twitter uses NoSQL technologies - Gizzard, Cassandra, Hadoop, Redis - to deal with increasing data amounts forcing them to scale out beyond what the traditional SQL has to offer.
Kevin Weil presents how Twitter does data analysis using Scribe for logging, base analysis with Pig/Hadoop, and specialized data analysis with HBase, Cassandra, and FlockDB.
Ron Bodkin presents the architecture used by Quantcast to process 100s of TB of data daily using Hadoop on dedicated systems, the applications, the type of data processed, and the infrastructure used.
Lars George and Fabrizio Schmidt present Germany’s largest social networks, Schuelervz, Studivz and Meinvz, the problems they are facing daily, the architecture used in the past and the need to move to a NoSQL solution. The presentation concludes with lessons learned and plans for the future.
Adam Wiggins believes that now is the time of horizontal scalability achieved by using resources that are transient, shardable and share nothing with other resources. He gives as example several applications and a language: memcached, CouchDB, Hadoop, Redis, Varnish, RabbitMQ, Erlang, detailing how each one applies those principles.
Ashish Thusoo and Namit Jain explain how Facebook manages to deal with 12 TB of compressed new data everyday with Hive’s help. Hive is an open source data warehousing framework built on Hadoop, allowing developers to perform analysis against large datasets using SQL.
CONTENT IN THIS BOX PROVIDED BY OUR SPONSOR
LET'S BUILD A BETTER ENTERPRISE
Spring helps development teams everywhere
build simple, portable, fast and flexible
JVM-based systems and applications.
GETTING STARTED: Developer Guides