InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Anthropic Announces Claude CoWork
Introducing Claude Cowork: Anthropic's groundbreaking AI agent revolutionizing file management on macOS. With advanced automation capabilities, it enhances document processing, organizes files, and executes multi-step workflows. Users must be cautious of backup needs due to recent issues. Explore its potential for efficient office solutions while ensuring data integrity.
-
Tracking and Controlling Data Flows at Scale in GenAI: Meta’s Privacy-Aware Infrastructure
Meta has revealed how it scales its Privacy-Aware Infrastructure (PAI) to support generative AI development while enforcing privacy across complex data flows. Using large-scale lineage tracking, PrivacyLib instrumentation, and runtime policy controls, the system enables consistent privacy enforcement for AI workloads like Meta AI glasses without introducing manual bottlenecks.
-
MIT's Recursive Language Models Improve Performance on Long-Context Tasks
Researchers at MIT's CSAIL published a design for Recursive Language Models (RLM), a technique for improving LLM performance on long-context tasks. RLMs use a programming environment to recursively decompose and process inputs, and can handle prompts up to 100x longer than base LLMs.
-
Google and Retail Leaders Launch Universal Commerce Protocol to Power Next‑Generation AI Shopping
Google launched the Universal Commerce Protocol (UCP), an open standard co-developed with Shopify, Target, and others, enabling AI-driven shopping agents to complete tasks end-to-end from product discovery to checkout and post-purchase management. UCP aims to standardize commerce capabilities, support multiple payment providers, and expand globally. Shaping the next generation of agentic commerce.
-
GitLab 18.8 Marks General Availability of the Duo Agent Platform
GitLab 18.8 brings a number of new features, including GitLab Duo Planner Agent, GitLab Duo Security Analyst Agent, auto-dismiss irrelevant vulnerabilities, and more. With this release, the GitLab Duo Agent Platform, enabling organizations to orchestrate AI agents, reaches general availability.
-
Pinterest's Moka: How Kubernetes Is Rewriting the Rules of Big Data Processing
Digital pinboard provider Pinterest has published an article explaining its blueprint for the future of large-scale data processing with its new platform Moka. The company is moving core workloads from ageing Hadoop infrastructure to a Kubernetes-based system on Amazon EKS, with Apache Spark as the main engine and support for other frameworks on the way.
-
Docker’s Cagent Brings Deterministic Testing to AI Agents
Docker is positioning its Cagent runtime as a way to bring deterministic testing back to AI agents, addressing a growing problem for teams building production agentic systems.
-
Hugging Face Releases FineTranslations, a Trillion-Token Multilingual Parallel Text Dataset
Hugging Face has released FineTranslations, a large-scale multilingual dataset containing more than 1 trillion tokens of parallel text across English and 500+ languages. The dataset was created by translating non-English content from the FineWeb2 corpus into English using Gemma3 27B, with the full data generation pipeline designed to be reproducible and publicly documented.
-
Android Studio Otter Boosts Agent Workflows and Adds LLM Flexibility
The latest Android Studio Otter feature drop introduces several new features that make it easier for developers to integrate AI-powered tools in their workflows, including the ability to set which LLM to use, enhanced agent mode through device interaction, support for natural language testing, and more.
-
Cloudflare Introduces Aggregations in R2 SQL for Data Analytics
Cloudflare recently announced support for aggregations in R2 SQL, a new feature that lets developers run SQL queries on data stored in R2. This enhancement expands R2 SQL beyond basic filtering and makes it more useful for analytical workloads without requiring separate data warehouse tools.
-
AWS Hikes EC2 Capacity Block Rates by 15% in Uniform ML Pricing Adjustment
AWS has raised EC2 Capacity Block prices for ML by 15% across all regions, impacting GPU-based workloads. The uniform price hikes affect top-tier instances powered by NVIDIA GPUs, underscoring supply chain pressures and inflation. With limited alternatives, organizations face higher costs, emphasizing the need for effective workload optimization and cost management strategies.
-
Mistral Releases OCR 3 with Improved Accuracy on Handwritten and Structured Documents
Mistral has released Mistral OCR 3, the latest version of its optical character recognition model, focused on higher accuracy across a wide range of document types, including handwritten notes, forms, low-quality scans, and complex tables.
-
How Agoda Unified Multiple Data Pipelines into a Single Source of Truth
Agoda recently described how it consolidated multiple independent data pipelines into a centralized Apache Spark-based platform to eliminate inconsistencies in financial data. The company implemented a multi-layered quality framework that combines automated validations, machine-learning-based anomaly detection, and data contracts, while processing millions of daily booking transactions.
-
AI-Powered Code Editor Cursor Introduces Dynamic Context Discovery to Improve Token-Efficiency
Cursor introduced a new approach to minimize the context size of requests sent to large language models. Called dynamic context discovery, this method moves away from including large amounts of static context upfront, allowing the agent to dynamically retrieve only the information it needs. This reduces token usage and limits the inclusion of potentially confusing or irrelevant details.
-
Vercel Open-Sources Bash Tool for Context Retrieval Using Local Filesystems
Vercel has open-sourced bash-tool that provides a Bash execution engine for AI agents, enabling them to run filesystem-based commands to retrieve context for model prompts.