InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Financial Modeling with Apache Spark: Calculating Value at Risk
Sandy Ryza aims to give a feel for what it is like to approach financial modeling with modern big data tools, using the Monte Carlo method for a a basic VaR calculation with Spark.
-
LDAP at Lightning Speed
Howard Chu covers highlights of the LMDB design and discusses some of the internal improvements in slapd due to LMDB, as well as the impact of LMDB on other projects.
-
Translating Imperative Code to MapReduce
The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.
-
The Deep Learning Revolution: Rethinking Machine Learning Pipelines
Soumith Chintala introduces deep learning, what it is, why it has become popular, and how it can be fitted into existing machine learning solutions.
-
Understanding Cloud, Big Data, Mobile and Security – Do They Play Nicely Together?
Colin Mower discusses the challenges met using together Cloud, Big Data, Mobile and Security and how these can work together to achieve business value.
-
Beating the Traffic Jam Using Embedded Devices, OPC-UA, Akka and NoSQL
Kristoffer Dyrkorn presents the experiences gained by the Norwegian Public Roads Administration in building a new infrastructure for road traffic measurements.
-
Analyzing Social Networks with F#
Evelina Gabasova explains how to run a social network analysis on Twitter and how to use data science tools to find out more about followers.
-
Customer Insight, from Data to Information
Thore Thomassen shares from experience how to combine structured data in a DWH with unstructured data in NoSQL, and using parallel data warehouse appliances to boost the analytical capabilities.
-
Big Data in Memory
John Davies shows a Spring work-flow consuming 7.4kB XML messages, binding them to 25kB Java but storing them in just 450 bytes each, 10 million derivative contracts in-memory on a laptop.
-
Gobblin: A Framework for Solving Big Data Ingestion Problem
Lin Qiao discusses the architecture of Gobblin, LinkedIn’s framework for addressing the need of high quality and high velocity data ingestion.
-
Better Together - Using Spark and Redshift to Combine Your Data with Public Datasets
Eugene Mandel discusses challenges of conforming data sources and compares processing stacks: Hadoop+Redshift vs Spark, showing how the technology drives the way the problem is modeled.
-
Become a Data-driven Organization with Machine Learning
Peter Harrington explains what you do with machine learning, and what are the building blocks for an application that uses machine learning from collected data to creating predictions for customers.