InfoQ Homepage Big Data Content on InfoQ
-
Spreadsheets for Developers
Felienne Hermans presents various algorithms that outlining the power of Excel, showing that spreadsheets are fit for TDD and rapid prototyping.
-
The Many Faces of Apache Kafka: How is Kafka Used in Practice
Neha Narkhede discusses how companies are using Apache Kafka and where it fits in the Big Data ecosystem.
-
Financial Modeling with Apache Spark: Calculating Value at Risk
Sandy Ryza aims to give a feel for what it is like to approach financial modeling with modern big data tools, using the Monte Carlo method for a a basic VaR calculation with Spark.
-
Lightning Fast Cluster Computing with Spark and Cassandra
Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.
-
Translating Imperative Code to MapReduce
The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.
-
Understanding Cloud, Big Data, Mobile and Security – Do They Play Nicely Together?
Colin Mower discusses the challenges met using together Cloud, Big Data, Mobile and Security and how these can work together to achieve business value.
-
A Taste of Random Decision Forests on Apache Spark
Sean Owen introduces Spark, Scala and random decision forests, and demonstrates the process of analyzing a real-world data set with them.
-
Big Data in Memory
John Davies shows a Spring work-flow consuming 7.4kB XML messages, binding them to 25kB Java but storing them in just 450 bytes each, 10 million derivative contracts in-memory on a laptop.
-
Gobblin: A Framework for Solving Big Data Ingestion Problem
Lin Qiao discusses the architecture of Gobblin, LinkedIn’s framework for addressing the need of high quality and high velocity data ingestion.
-
Better Together - Using Spark and Redshift to Combine Your Data with Public Datasets
Eugene Mandel discusses challenges of conforming data sources and compares processing stacks: Hadoop+Redshift vs Spark, showing how the technology drives the way the problem is modeled.
-
High Performance Computing Contributions to the World of Big Data
Sharan Kalwani presents the history of HPC and the technologies and trends which have contributed to creating the world of big data, covering applications of HPC resulting in big data technologies.
-
A Distributed Transactional Database on Hadoop
John Leach explains using HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source, showing how Hadoop/HBase can replace traditional RDBMS solutions.