InfoQ Homepage Data Analytics Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

Cloudflare Details Unified Data Platform Where Billing Workloads Account for 53% of Queries

Cloudflare details Town Lake, an internal unified data platform, and Skipper, an AI analytics agent unifying access to operational, billing, security, and business data. The platform processed ~91K billing queries, with billing forming majority usage. Built on a lakehouse architecture using Trino, Iceberg, R2, and DataHub, it enables governed cross-system analytics and natural language access.

Leela Kumili
on Jul 03, 2026
Architecture & Design

Inside Target’s LLM-Based System for Semantic Matching in Marketing Forecast Pipelines

Target built a generative AI system to improve marketing campaign forecasting by retrieving and ranking similar historical campaigns. Using embeddings, vector search, and LLM ranking, it replaces rule-based workflows. Evaluation shows 75% top-1 and 100% top-3 coverage. The system reduces manual effort, improves consistency, and uses feedback loops to refine retrieval using campaign outcomes.

Leela Kumili
on Jun 29, 2026
AI, ML & Data Engineering

Anthropic Reports Claude Now Handles 95% of Internal Analytics Queries

Anthropic recently reported that Claude now handles around 95% of its internal analytics requests, letting employees query business data independently instead of relying on data teams. The company attributes this result less to advances in models and more to data governance, semantic definitions, and operational discipline.

Renato Losio
on Jun 21, 2026
AI, ML & Data Engineering

DuckDB Quack: Client/Server Protocol over HTTP for Multi-User Analytics

DuckDB has recently announced Quack, a new remote protocol over HTTP that lets multiple DuckDB instances connect to and work with the same database over a network. The protocol introduces client-server capabilities to a database that was previously mostly local and embedded.

Renato Losio
on May 31, 2026
AI, ML & Data Engineering

Neobank Monzo Builds Governed Data Mesh across 100 Teams and 12000 dbt Models

Monzo recently redesigned its data warehouse to support more than 100 teams working on over 12000 dbt models. Introducing a so-called "meshy" approach, Monzo cut warehouse costs by about 40% and improved data delivery speed by 25%.

Renato Losio
on May 17, 2026
Architecture & Design

Netflix Serves 84% of Query Results from Cache with Interval-Aware Caching in Apache Druid

Netflix improves Apache Druid performance with interval aware caching, serving 84% of analytics results from cache and reducing query load by 33%. The system decomposes rolling window queries into reusable time segments, enabling partial cache reuse and recomputation only for recent data. At scale, it reduces scan volume, improves P90 latency, and optimizes real time analytics workloads.

Leela Kumili
on May 11, 2026
Architecture & Design

LinkedIn Consolidates Hiring Data Pipelines to Power AI Driven Talent Systems

LinkedIn introduced a unified integrations platform to standardize and reconcile hiring data across systems. The platform reduces onboarding time by 72%, improves data consistency and completeness, and enables scalable AI-driven hiring features through standardized schemas, orchestration workflows, and centralized data processing.

Leela Kumili
on May 06, 2026
Architecture & Design

Confluent Moves Schema IDs to Kafka Headers to Simplify Schema Governance

Confluent introduces a new approach in Apache Kafka that moves schema IDs from message payloads to record headers, aiming to simplify schema governance and evolution. The update integrates with Schema Registry, improves compatibility across serialization formats, and reduces coupling between data and metadata in event-driven architectures.

Leela Kumili
on May 01, 2026
Architecture & Design

Uber’s Hive Federation Decentralizes 16K Datasets and 10+ PB for Zero-Downtime Analytics at Scale

Uber has decentralized its Hive data warehouse, migrating 16,000 datasets totaling over 10 petabytes using pointer-based federation. The migration ensures zero downtime, strict ACL enforcement, improved governance, and scalable, domain-specific datasets for analytics and machine learning workloads.

Leela Kumili
on Apr 09, 2026
Architecture & Design

Uber Launches IngestionNext: Streaming-First Data Lake Cuts Latency and Compute by 25%

Uber launches IngestionNext, a streaming-first data lake ingestion platform that reduces data latency from hours to minutes and cuts compute usage by 25%. Built on Kafka, Flink, and Apache Hudi, it supports thousands of datasets, enabling faster analytics, experimentation, and machine learning workloads globally.

Leela Kumili
on Mar 25, 2026
Cloud

Google BigQuery Previews Cross-Region SQL Queries for Distributed Data

Google Cloud has recently announced the preview of a global queries feature for BigQuery. The new option lets developers run SQL queries across data stored in different geographic regions without first moving or copying the data to aggregate the results.

Renato Losio
on Mar 08, 2026
AI, ML & Data Engineering

Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads

Databricks has recently announced the general availability of Lakebase, a serverless, PostgreSQL-based OLTP database that scales compute and storage independently. Lakebase is designed to integrate with the Databricks platform, providing a hybrid solution that combines both transactional and analytical capabilities.

Renato Losio
on Feb 22, 2026
Architecture & Design

Solving Fragmented Mobile Analytics: Uber’s Platform-Led Approach

Uber Engineering outlines its platform-led mobile analytics redesign, standardizing event instrumentation across iOS and Android to improve cross-platform consistency, reduce engineering effort, and provide reliable insights for product and data teams.

Leela Kumili
on Jan 13, 2026
AI, ML & Data Engineering

DuckDB's WebAssembly Client Allows Querying Iceberg Datasets in the Browser

DuckDB has recently introduced end-to-end interaction with Iceberg REST Catalogs directly within a browser tab, requiring no infrastructure setup. The new feature leverages DuckDB-Wasm, a WebAssembly port of DuckDB that runs in the browser, allowing users to query, read, and write Iceberg tables in a serverless manner.

Renato Losio
on Jan 04, 2026
Architecture & Design

Inside Uber’s Query Architecture: Simplifying Layers and Improving Observability

Uber rebuilt its Apache Pinot query architecture, replacing the Presto-based Neutrino system with a lightweight proxy called Cellar and Pinot’s Multi-Stage Engine Lite Mode. The redesign simplifies SQL execution, improves resource management, and ensures predictable performance for large-scale analytics workloads.

Leela Kumili
on Nov 06, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

News