InfoQ Homepage Big Data Content on InfoQ
-
Putting the Spark in Functional Fashion Tech Analytics
Gareth Rogers shows how his team used Clojure to provide a solid platform to connect and manage an AWS hosted analytics pipeline and the pitfalls they encountered on the way.
-
Apache Metron in the Real World – Big Data and Cybersecurity, a Perfect Match
Dave Russell takes a look at a number of different organizations who are on their big data cybersecurity journey with Apache Metron.
-
Petastorm: A Light-Weight Approach to Building ML Pipelines
Yevgeni Litvin describes how Petastorm facilitates tighter integration between Big Data and Deep Learning worlds, simplifies data management and data pipelines, and speeds up model experimentation.
-
People You May Know: Fast Recommendations over Massive Data
Sumit Rangwala and Felix GV present the evolution of PYMK’s architecture, focusing on Gaia, a real-time graph computing capability, and Venice, an online feature store with scoring capability.
-
Productionizing H2O Models with Apache Spark
Jakub Hava demonstrates the creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python.
-
Winning Ways for Your Visualization Plays
Mark Grundland explores practical techniques for information visualization design to take better account of the fundamental limitations of visual perception.
-
Migrating from Big Data Architecture to Spring Cloud
Lenny Jaramillo discusses how Northern Trust migrated to PCF, highlighting how this helped them accelerate the delivery of functionality to their customers.
-
Using Data Effectively: beyond Art and Science
Hilary Parker talks about approaches and techniques to collect the most useful data, analyze it in a scientific way, and use it most effectively to drive actions and decisions.
-
Big Data and Deep Learning: A Tale of Two Systems
Zhenxiao Luo explains how Uber tackles data caching in large-scale DL, detailing Uber’s ML architecture and discussing how Uber uses Big Data, concluding by sharing AI use cases.
-
Accelerated Spark on Azure: Seamless and Scalable Hardware Offloads in the Cloud
Yuval Degani shows how hardware accelerations in Azure can be utilized to speed-up Spark jobs, with the aid of RDMA (Remote Direct Memory Access) support in the VM.
-
Implementing AutoML Techniques at Salesforce Scale
Matthew Tovbin shows how to build ML models using AutoML (Salesforce), including techniques for automatic data processing, feature generation, model selection, hyperparameter tuning and evaluation.
-
Privacy Ethics – A Big Data Problem
Raghu Gollamudi broadly covers best practices with respect to Data Management aspects from mapping Enterprise data to applying Data Protection rules like GDPR at petabyte scale.