Older rss

Pulsar: Real-time Analytics at Scale

Posted by Sharad Murthy, Tony Ng  on  Sep 13, 2015

Sharad Murthy & Tony Ng present Pulsar, a real-time streaming system which can scale to millions of events per second with high availability and 4GL language support.


Exploratory Data Analysis with R

Posted by Matthew Renze  on  Sep 13, 2015 1

Matthew Renze introduces the R programming language and demonstrates how R can be used for exploratory data analysis.


Spreadsheets for Developers

Posted by Felienne Hermans  on  Sep 11, 2015

Felienne Hermans presents various algorithms that outlining the power of Excel, showing that spreadsheets are fit for TDD and rapid prototyping.


The Many Faces of Apache Kafka: How is Kafka Used in Practice

Posted by Neha Narkhede  on  Aug 27, 2015 1

Neha Narkhede discusses how companies are using Apache Kafka and where it fits in the Big Data ecosystem.


Financial Modeling with Apache Spark: Calculating Value at Risk

Posted by Sandy Ryza  on  Jul 12, 2015

Sandy Ryza aims to give a feel for what it is like to approach financial modeling with modern big data tools, using the Monte Carlo method for a a basic VaR calculation with Spark.


Lightning Fast Cluster Computing with Spark and Cassandra

Posted by Piotr Kołaczkowski  on  Jun 17, 2015

Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.


Translating Imperative Code to MapReduce

Posted by Cosmin Radoi, Rodric Rabbah, Stephen J Fink, Manu Sridharan  on  Jun 10, 2015

The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.


Understanding Cloud, Big Data, Mobile and Security – Do They Play Nicely Together?

Posted by Colin Mower  on  May 12, 2015

Colin Mower discusses the challenges met using together Cloud, Big Data, Mobile and Security and how these can work together to achieve business value.


A Taste of Random Decision Forests on Apache Spark

Posted by Sean Owen  on  Apr 28, 2015

Sean Owen introduces Spark, Scala and random decision forests, and demonstrates the process of analyzing a real-world data set with them.


Big Data in Memory

Posted by John Davies  on  Mar 14, 2015

John Davies shows a Spring work-flow consuming 7.4kB XML messages, binding them to 25kB Java but storing them in just 450 bytes each, 10 million derivative contracts in-memory on a laptop.


Gobblin: A Framework for Solving Big Data Ingestion Problem

Posted by Lin Qiao  on  Mar 12, 2015

Lin Qiao discusses the architecture of Gobblin, LinkedIn’s framework for addressing the need of high quality and high velocity data ingestion.


Better Together - Using Spark and Redshift to Combine Your Data with Public Datasets

Posted by Eugene Mandel  on  Mar 12, 2015

Eugene Mandel discusses challenges of conforming data sources and compares processing stacks: Hadoop+Redshift vs Spark, showing how the technology drives the way the problem is modeled.

General Feedback
Marketing and all content copyright © 2006-2015 C4Media Inc. hosted at Contegix, the best ISP we've ever worked with.
Privacy policy