InfoQ Homepage Data Analysis Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

Uber’s Hive Federation Decentralizes 16K Datasets and 10+ PB for Zero-Downtime Analytics at Scale

Uber has decentralized its Hive data warehouse, migrating 16,000 datasets totaling over 10 petabytes using pointer-based federation. The migration ensures zero downtime, strict ACL enforcement, improved governance, and scalable, domain-specific datasets for analytics and machine learning workloads.

Leela Kumili
on Apr 09, 2026
AI, ML & Data Engineering

Cloudflare Introduces Aggregations in R2 SQL for Data Analytics

Cloudflare recently announced support for aggregations in R2 SQL, a new feature that lets developers run SQL queries on data stored in R2. This enhancement expands R2 SQL beyond basic filtering and makes it more useful for analytical workloads without requiring separate data warehouse tools.

Renato Losio
on Jan 17, 2026
DevOps

Sauce Labs Launches AI Tool for Faster Test Analysis

Sauce Labs has launched Sauce AI for Insights, an AI-driven tool that accelerates test analysis by providing natural-language explanations, visual summaries and faster root cause detection. The company claims that it reduces debugging time, improves release readiness, and addresses the growing complexity of test data.

Mark Silvester
on Nov 25, 2025
Development

Meta Open Sources OpenZL: a Universal Compression Framework for Structured Data

Meta’s OpenZL changes the way data is compressed by maximizing efficiency for structured datasets, outperforming traditional methods like Zstandard. With a universal decompressor and custom compression plans, it simplifies operational deployment while achieving superior compression ratios and speeds, making it an essential tool for modern data infrastructures.

Steef-Jan Wiggers
on Oct 28, 2025
AI, ML & Data Engineering

Hugging Face Introduces AI Sheets, a No-Code Tool for Dataset Transformation

Hugging Face has released AI Sheets, an open-source application designed to let users build, transform, and enrich datasets using AI models through a spreadsheet-like interface. The tool, available both on the Hub and for local deployment, allows users to experiment with thousands of open models, including OpenAI’s gpt-oss, without requiring code.

Robert Krzaczyński
on Sep 08, 2025
AI, ML & Data Engineering

Google Releases MedGemma: Open AI Models for Medical Text and Image Analysis

Google has released MedGemma, a pair of open-source generative AI models designed to support medical text and image understanding in healthcare applications. Based on the Gemma 3 architecture, the models are available in two configurations: MedGemma 4B, a multimodal model capable of processing both images and text, and MedGemma 27B, a larger model focused solely on medical text.

Robert Krzaczyński
on May 30, 2025
AI, ML & Data Engineering

Perplexity Unveils Deep Research: AI-Powered Tool for Advanced Analysis

Perplexity has introduced Deep Research, an AI-powered tool designed for conducting in-depth analysis across various fields, including finance, marketing, and technology. The system automates the research process by performing multiple searches, analyzing extensive sources, and synthesizing findings into structured reports within minutes.

Robert Krzaczyński
on Feb 24, 2025
Cloud

Google Cloud Launches C4 Machine Series: High-Performance Computing and Data Analytics

Google Cloud recently announced the general availability of its new C4 machine series, powered by 4th Gen Intel Xeon Scalable Processors (Sapphire Rapids). The series offers a range of configurations tailored to meet the needs of demanding applications such as high-performance computing (HPC), large-scale simulations, and data analytics.

Steef-Jan Wiggers
on Aug 27, 2024
Cloud

Confluent Announces Apache Flink on Confluent Cloud in Open Preview

Confluent recently announced the open preview of Apache Flink on Confluent Cloud as a fully-managed service for stream processing. The company claims that the managed service will make it easier for companies to filter, join, and enrich data streams with Flink.

Steef-Jan Wiggers
on Sep 29, 2023
Architecture & Design

Pfizer Uses Serverless Architecture on AWS to Scale Processing of Digital Biomarkers

Pfizer upgraded the serverless architecture for processing digital biomarker data at scale to make it more flexible and configurable. They created a framework that uses a file processing pipeline built with AWS Step Functions and other serverless services, as well as a custom Python package for data ingestion and processing.

Rafal Gancarz
on Jul 26, 2023
Culture & Methods

Using Data to Predict Future Usage and Increase User Insights

By identifying usage trends, you can proactively adjust load, scaling, and routing to better handle the load on particular parts of the globe when you know it will peak there. Data about how users interact with your application can be used to design future features that better mimic these patterns and ensure that new features have a better chance of solving real user problems and getting adopted.

Ben Linders
on Sep 21, 2022
Cloud

A New Microsoft Platform in Town: the Microsoft Intelligent Data Platform

Recently Microsoft introduced a new platform called the Microsoft Intelligent Data Platform that fully integrates their database, analytics, and governance offerings. The new platform encompasses everything already available in the Azure Data space (Azure Data Factory, Azure Data Explorer, etc.) to the Synapse Analytics products, Power BI, and the newly rebranded Purview data governance service.

Steef-Jan Wiggers
on Jun 08, 2022
Culture & Methods

Using Machine Learning in Testing and Maintenance

With machine learning, we can reduce maintenance efforts and improve the quality of products. It can be used in various stages of the software testing life-cycle, including bug management, which is an important part of the chain. We can analyze large amounts of data for classifying, triaging, and prioritizing bugs in a more efficient way by means of machine learning algorithms.

Ben Linders
on Mar 18, 2021
Cloud

Google Brings Databricks to Its Cloud Platform

Recently Google announced a partnership with Databricks to bring their fully-managed Apache Spark offering and data lake capabilities to Google Cloud. The offering will become available as Databricks on Google Cloud.

Steef-Jan Wiggers
on Feb 23, 2021
Cloud

Amazon Announces the General Availability of AWS Glue 2.0

AWS Glue is a fully-managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. With AWS Glue, customers don’t have to provision or manage any resources, and only pay for resources when the service is running.

Steef-Jan Wiggers
on Aug 19, 2020

Newer News

Older News

InfoQ Software Architects' Newsletter

News