InfoQ Homepage Data Analysis Content on InfoQ
-
Sauce Labs Launches AI Tool for Faster Test Analysis
Sauce Labs has launched Sauce AI for Insights, an AI-driven tool that accelerates test analysis by providing natural-language explanations, visual summaries and faster root cause detection. The company claims that it reduces debugging time, improves release readiness, and addresses the growing complexity of test data.
-
Meta Open Sources OpenZL: a Universal Compression Framework for Structured Data
Meta’s OpenZL changes the way data is compressed by maximizing efficiency for structured datasets, outperforming traditional methods like Zstandard. With a universal decompressor and custom compression plans, it simplifies operational deployment while achieving superior compression ratios and speeds, making it an essential tool for modern data infrastructures.
-
Hugging Face Introduces AI Sheets, a No-Code Tool for Dataset Transformation
Hugging Face has released AI Sheets, an open-source application designed to let users build, transform, and enrich datasets using AI models through a spreadsheet-like interface. The tool, available both on the Hub and for local deployment, allows users to experiment with thousands of open models, including OpenAI’s gpt-oss, without requiring code.
-
Google Releases MedGemma: Open AI Models for Medical Text and Image Analysis
Google has released MedGemma, a pair of open-source generative AI models designed to support medical text and image understanding in healthcare applications. Based on the Gemma 3 architecture, the models are available in two configurations: MedGemma 4B, a multimodal model capable of processing both images and text, and MedGemma 27B, a larger model focused solely on medical text.
-
Perplexity Unveils Deep Research: AI-Powered Tool for Advanced Analysis
Perplexity has introduced Deep Research, an AI-powered tool designed for conducting in-depth analysis across various fields, including finance, marketing, and technology. The system automates the research process by performing multiple searches, analyzing extensive sources, and synthesizing findings into structured reports within minutes.
-
Google Cloud Launches C4 Machine Series: High-Performance Computing and Data Analytics
Google Cloud recently announced the general availability of its new C4 machine series, powered by 4th Gen Intel Xeon Scalable Processors (Sapphire Rapids). The series offers a range of configurations tailored to meet the needs of demanding applications such as high-performance computing (HPC), large-scale simulations, and data analytics.
-
Confluent Announces Apache Flink on Confluent Cloud in Open Preview
Confluent recently announced the open preview of Apache Flink on Confluent Cloud as a fully-managed service for stream processing. The company claims that the managed service will make it easier for companies to filter, join, and enrich data streams with Flink.
-
Pfizer Uses Serverless Architecture on AWS to Scale Processing of Digital Biomarkers
Pfizer upgraded the serverless architecture for processing digital biomarker data at scale to make it more flexible and configurable. They created a framework that uses a file processing pipeline built with AWS Step Functions and other serverless services, as well as a custom Python package for data ingestion and processing.
-
Using Data to Predict Future Usage and Increase User Insights
By identifying usage trends, you can proactively adjust load, scaling, and routing to better handle the load on particular parts of the globe when you know it will peak there. Data about how users interact with your application can be used to design future features that better mimic these patterns and ensure that new features have a better chance of solving real user problems and getting adopted.
-
A New Microsoft Platform in Town: the Microsoft Intelligent Data Platform
Recently Microsoft introduced a new platform called the Microsoft Intelligent Data Platform that fully integrates their database, analytics, and governance offerings. The new platform encompasses everything already available in the Azure Data space (Azure Data Factory, Azure Data Explorer, etc.) to the Synapse Analytics products, Power BI, and the newly rebranded Purview data governance service.
-
Using Machine Learning in Testing and Maintenance
With machine learning, we can reduce maintenance efforts and improve the quality of products. It can be used in various stages of the software testing life-cycle, including bug management, which is an important part of the chain. We can analyze large amounts of data for classifying, triaging, and prioritizing bugs in a more efficient way by means of machine learning algorithms.
-
Google Brings Databricks to Its Cloud Platform
Recently Google announced a partnership with Databricks to bring their fully-managed Apache Spark offering and data lake capabilities to Google Cloud. The offering will become available as Databricks on Google Cloud.
-
Amazon Announces the General Availability of AWS Glue 2.0
AWS Glue is a fully-managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. With AWS Glue, customers don’t have to provision or manage any resources, and only pay for resources when the service is running.
-
Amazon Introduces the New Streaming ETL Feature on AWS Glue
Recently, Amazon announced AWS Glue now supports streaming ETL. With this new feature, customers can easily set up continuous ingestion pipelines that prepare streaming data on the fly and make it available for analysis in seconds.
-
Data Science at the Intersection of Emerging Technologies
Kirk Borne, principal data scientist at Booz Allen Hamilton, gave a keynote presentation at this year’s Oracle Code One Conference on how the connection between emerging technologies, data, and machine learning are transforming data into value. Emerging technological innovations like AI, robotics, computer vision and more, are enabled by data and create value from data.