InfoQ Homepage Database Content on InfoQ
-
Applied Distributed Research in Apache Cassandra
Jonathan Ellis explains the challenges and successes Cassandra has had in creating transactions, materialized views, and a strongly consistent cluster membership within this peer-to-peer paradigm.
-
Scio: Moving Big Data to Google Cloud, a Spotify Story
Neville Li tells the Spotify’s story of migrating their big data infrastructure to Google Cloud, replacing Hive and Scalding with BigQuery and Scio, which helped them iterate faster.
-
In-Memory Caching: Curb Tail Latency with Pelikan
Yao Yue introduces Pelikan - a framework to implement distributed caches such as Memcached and Redis. She discusses the system aspects that are important to the performance of such services.
-
Data Preparation for Data Science: A Field Guide
Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.
-
Building Reliability in an Unreliable World
Greg Murphy describes how GameSparks has designed their platform to be tolerant of many things: unreliable and slow internet connectivity, cloud resources that can fail without warning, and more.
-
AI from an Investment Perspective
The panelists discuss AI from an investment perspective, the challenges, the risks, trends, the role of Deep Learning, successful AI use cases, and more.
-
Causal Consistency for Large Neo4j Clusters
Jim Webber explores the new Causal clustering architecture for Neo4j, how it allows users to read writes straightforwardly, explaining why this is difficult to achieve in distributed systems.
-
Big Data Infrastructure @ LinkedIn
Shirshanka Das describes LinkedIn’s Big Data Infrastructure and its evolution through the years, including details on the motivation and architecture of Gobblin, Pinot and WhereHows.
-
Performance and Search
Dan Luu discusses how to estimate performance using back of the envelope calculations that can be done in minutes or hours, even for applications that take months or years to implement.
-
Scaling up Near Real-Time Analytics @Uber &LinkedIn
Chinmay Soman and Yi Pan discuss how Uber and LinkedIn use Apache Samza, Calcite and Pinot along with the analytics platform AthenaX to transform data to make it available for querying in minutes.
-
Real-Time Recommendations Using Spark Streaming
Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.
-
Stream Processing & Analytics with Flink @Uber
Danny Yuan discusses how Uber builds its next generation of stream processing system to support real-time analytics as well as complex event processing.