Jayesh Thakrar shows what can be done with irb, how to exploit JRuby-Java integration, and demonstrates how the Shell can be used in Hadoop streaming to perform complex and large volume batch jobs.
In this solutions track talk, sponsored by DataStax, Johnny Miller introduces the Cassandra native protocol, native drivers and CQL, explaining how to query Cassandra without Trift or RPC.
Details on Pinterest's architeture, its systems -Pinball, Frontdoor-, and stack - MongoDB, Cassandra, Memcache, Redis, Flume, Kafka, EMR, Qubole, Redshift, Python, Java, Go, Nutcracker, Puppet, etc.
Matthias Broecheler discusses graph computing, introducing the Aurelius graph cluster enabling graph computing at scale by building on distributed systems like Cassandra, HBase, and Hadoop.
Sebastian Kanthak details how Spanner relies on GPS and atomic clocks to provide two of its innovative features: Lock-free strong reads and global snapshots consistent with external events.
Nicolas Spiegelberg discusses Facebook Messages built on top of HBase, the systems involved and the scaling challenges for handling 500TB of new data per month.
Randy Shoup details some of the pieces forming Google’s technology stack, BigTable, Megastore, Dremel, virtualization, etc. and the design principles of their their cloud-based applications.
Matthew Dennis covers the most common mistakes made with Cassandra that he has noticed being made both in deployment and code.
Peter Bell introduces 4 NoSQL categories –Key-Value, Document, Column, Graph - and explains how one can use Spring Data to work with such data stores.
Kumar Palaniapan and Scott Fleming present how NetApp deals with big data using Hadoop, HBase, Flume, and Solr, collecting and analyzing TBs of log data with Think Big Analytics.
Jake Luciani introduces Brisk, a Hadoop and Hive distribution using Cassandra for core services and storage, presenting the benefits of running Hadoop in a peer-to-peer masterless architecture.
Siddharth Anand presents how Netflix’s architecture evolved from a traditional 3-tier configuration to a cloud-based one, detailing the scalability and fault tolerant issues encountered.