InfoQ Homepage Big Data Content on InfoQ
-
What is a Data Citizen?
Caitlin McDonald discusses how big data affects people online and the ethics to be considered when dealing with data.
-
When Data Kills
Cori Crider shares insights from her investigations of US drone strikes in Yemen and Pakistan, and explores how misuse of mass surveillance data has claimed innocent lives.
-
Streaming SQL Foundations: Why I ❤Streams+Tables
Tyler Akidau explores the relationship between the Beam Model and stream & table theory, stream processing in SQL with Apache Beam, Calcite, Flink, Kafka KSQL and Apache Spark’s Structured streaming.
-
Bias in BigData/AI and ML
Leslie Miley discusses how inherent bias in data sets has affected things from the 2016 Presidential race to criminal sentencing in the United States.
-
Scaling with Apache Spark
Holden Karau looks at Apache Spark from a performance/scaling point of view and what’s needed to handle large datasets.
-
Serverless Design Patterns with AWS Lambda: Big Data with Little Effort
Tim Wagner discusses Big Data on serverless, showing working examples and how to set up a CI/CD pipeline, demonstrating AWS Lambda with the Serverless Application Model (SAM).
-
Scio: Moving Big Data to Google Cloud, a Spotify Story
Neville Li tells the Spotify’s story of migrating their big data infrastructure to Google Cloud, replacing Hive and Scalding with BigQuery and Scio, which helped them iterate faster.
-
Data Preparation for Data Science: A Field Guide
Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.
-
AI from an Investment Perspective
The panelists discuss AI from an investment perspective, the challenges, the risks, trends, the role of Deep Learning, successful AI use cases, and more.
-
Big Data Infrastructure @ LinkedIn
Shirshanka Das describes LinkedIn’s Big Data Infrastructure and its evolution through the years, including details on the motivation and architecture of Gobblin, Pinot and WhereHows.
-
Real-Time Recommendations Using Spark Streaming
Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.
-
Building a Data Science Capability from Scratch
Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.