Ken Kousen discusses combining various technologies: Groovy, Ratpack, MongoDB, Grails, REST.
Sean Owen introduces Spark, Scala and random decision forests, and demonstrates the process of analyzing a real-world data set with them.
Evelina Gabasova explains how to run a social network analysis on Twitter and how to use data science tools to find out more about followers.
Dave McCrory talks about what is Data Gravity, how it affects performance and portability and why these effects are amplified when there are larger volumes of data.
Emily Green is taking a look at how SoundCloud uses Cassandra. She describes a couple of Cassandra instances, from the point of view of the products and functionality they support.
Thore Thomassen shares from experience how to combine structured data in a DWH with unstructured data in NoSQL, and using parallel data warehouse appliances to boost the analytical capabilities.
Julien Le Dem discusses the advantages of a columnar data layout, specifically the features and design choices Apache Parquet uses to achieve goals of interoperability, space and query efficiency.
Jeff Scott Brown introduces GORM, a super powerful ORM tool that makes ORM simple by leveraging the flexibility and expressiveness of a dynamic language like Groovy.
Reid Draper shows how real world distributed database work, communicate and are tested, trading RPC for messaging, unit-tests for QuickCheck, and micro-benchmarks for multi-week stress tests.
James Richardson, Nat Pryce discuss some of the challenges faced using Neo4J for interactive analysis of large data imports (80K nodes, 150k relationships) and how they overcame them.
John Davies shows a Spring work-flow consuming 7.4kB XML messages, binding them to 25kB Java but storing them in just 450 bytes each, 10 million derivative contracts in-memory on a laptop.
Lin Qiao discusses the architecture of Gobblin, LinkedIn’s framework for addressing the need of high quality and high velocity data ingestion.