Emily Green is taking a look at how SoundCloud uses Cassandra. She describes a couple of Cassandra instances, from the point of view of the products and functionality they support.
Julien Le Dem discusses the advantages of a columnar data layout, specifically the features and design choices Apache Parquet uses to achieve goals of interoperability, space and query efficiency.
Eric Redmond explains the differences and commonalities amongst many kinds of databases and takes a stab at the marketing term “NoSQL.”
John Leach explains using HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source, showing how Hadoop/HBase can replace traditional RDBMS solutions.
The authors focus on POJO persistence over Cassandra, including automatic Cassandra schema generation and Spring context configuration using both XML and Java.
This talk goes over the design motivation for Zen and describe its internals including the API, type system and HBase backend.
Jayesh Thakrar shows what can be done with irb, how to exploit JRuby-Java integration, and demonstrates how the Shell can be used in Hadoop streaming to perform complex and large volume batch jobs.
In this solutions track talk, sponsored by DataStax, Johnny Miller introduces the Cassandra native protocol, native drivers and CQL, explaining how to query Cassandra without Trift or RPC.
Details on Pinterest's architeture, its systems -Pinball, Frontdoor-, and stack - MongoDB, Cassandra, Memcache, Redis, Flume, Kafka, EMR, Qubole, Redshift, Python, Java, Go, Nutcracker, Puppet, etc.
Matthias Broecheler discusses graph computing, introducing the Aurelius graph cluster enabling graph computing at scale by building on distributed systems like Cassandra, HBase, and Hadoop.
Sebastian Kanthak details how Spanner relies on GPS and atomic clocks to provide two of its innovative features: Lock-free strong reads and global snapshots consistent with external events.
Nicolas Spiegelberg discusses Facebook Messages built on top of HBase, the systems involved and the scaling challenges for handling 500TB of new data per month.