InfoQ Homepage Data Analytics Content on InfoQ
-
Netflix Serves 84% of Query Results from Cache with Interval-Aware Caching in Apache Druid
Netflix improves Apache Druid performance with interval aware caching, serving 84% of analytics results from cache and reducing query load by 33%. The system decomposes rolling window queries into reusable time segments, enabling partial cache reuse and recomputation only for recent data. At scale, it reduces scan volume, improves P90 latency, and optimizes real time analytics workloads.
-
LinkedIn Consolidates Hiring Data Pipelines to Power AI Driven Talent Systems
LinkedIn introduced a unified integrations platform to standardize and reconcile hiring data across systems. The platform reduces onboarding time by 72%, improves data consistency and completeness, and enables scalable AI-driven hiring features through standardized schemas, orchestration workflows, and centralized data processing.
-
Confluent Moves Schema IDs to Kafka Headers to Simplify Schema Governance
Confluent introduces a new approach in Apache Kafka that moves schema IDs from message payloads to record headers, aiming to simplify schema governance and evolution. The update integrates with Schema Registry, improves compatibility across serialization formats, and reduces coupling between data and metadata in event-driven architectures.
-
Uber’s Hive Federation Decentralizes 16K Datasets and 10+ PB for Zero-Downtime Analytics at Scale
Uber has decentralized its Hive data warehouse, migrating 16,000 datasets totaling over 10 petabytes using pointer-based federation. The migration ensures zero downtime, strict ACL enforcement, improved governance, and scalable, domain-specific datasets for analytics and machine learning workloads.
-
Uber Launches IngestionNext: Streaming-First Data Lake Cuts Latency and Compute by 25%
Uber launches IngestionNext, a streaming-first data lake ingestion platform that reduces data latency from hours to minutes and cuts compute usage by 25%. Built on Kafka, Flink, and Apache Hudi, it supports thousands of datasets, enabling faster analytics, experimentation, and machine learning workloads globally.
-
Google BigQuery Previews Cross-Region SQL Queries for Distributed Data
Google Cloud has recently announced the preview of a global queries feature for BigQuery. The new option lets developers run SQL queries across data stored in different geographic regions without first moving or copying the data to aggregate the results.
-
Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads
Databricks has recently announced the general availability of Lakebase, a serverless, PostgreSQL-based OLTP database that scales compute and storage independently. Lakebase is designed to integrate with the Databricks platform, providing a hybrid solution that combines both transactional and analytical capabilities.
-
Solving Fragmented Mobile Analytics: Uber’s Platform-Led Approach
Uber Engineering outlines its platform-led mobile analytics redesign, standardizing event instrumentation across iOS and Android to improve cross-platform consistency, reduce engineering effort, and provide reliable insights for product and data teams.
-
DuckDB's WebAssembly Client Allows Querying Iceberg Datasets in the Browser
DuckDB has recently introduced end-to-end interaction with Iceberg REST Catalogs directly within a browser tab, requiring no infrastructure setup. The new feature leverages DuckDB-Wasm, a WebAssembly port of DuckDB that runs in the browser, allowing users to query, read, and write Iceberg tables in a serverless manner.
-
Inside Uber’s Query Architecture: Simplifying Layers and Improving Observability
Uber rebuilt its Apache Pinot query architecture, replacing the Presto-based Neutrino system with a lightweight proxy called Cellar and Pinot’s Multi-Stage Engine Lite Mode. The redesign simplifies SQL execution, improves resource management, and ensures predictable performance for large-scale analytics workloads.
-
Cloudflare Introduces Data Platform with Zero Egress Fees
Cloudflare has recently announced the open beta of Cloudflare Data Platform, a managed solution for ingesting, storing, and querying analytical data tables using open standards such as Apache Iceberg.
-
Cloudflare Chooses PostgreSQL Extension over Specialized OLAP for 100K Row/Second Analytics
In a recent article from the engineering team behind the Zero Trust product suite, Cloudflare explains why it chose TimescaleDB over ClickHouse to add analytics and reporting capabilities to its internal platform. The author highlights the “phenomenal balance” between the simplicity of storing analytical data alongside configuration data and the performance of a specialized OLAP system.
-
Amazon S3 Adds Sort and Z-Order Compaction to Improve Apache Iceberg Query Performance
AWS has recently announced that Amazon S3 now supports sort and z-order compaction for Apache Iceberg tables. The new features reduce scan times and engine costs, and are available for both S3 Tables and traditional S3 buckets using AWS Glue Data Catalog optimization.
-
HTAP: the Rise and Fall of Unified Database Systems?
A recent article by Zhou Sun sparked a debate in the data community about the future of HTAP systems. Hybrid transaction/analytical processing was meant to help integrate historical and online data at scale, supporting more flexible query methods and reducing business complexity.
-
The Open-Source Version of InfluxDB 3 Reaches GA
Two years after releasing the GA version of InfluxData’s enterprise edition, their open-source version also reached that level of maturity. Conceptualised for real-time workloads and ease of running, the core version leaves aside features like long-term storage optimisations, compaction or high availability (HA), read replicas, or fine-grained access controls.