InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Google’s Aletheia Advances the State of the Art of Fully Autonomous Agentic Math Research
Google announced Aletheia, an AI using Gemini 3 Deep Think that solved 6/10 novel math problems in the FirstProof challenge. Aletheia also scored ~91.9% on IMO-ProofBench, signaling a significant shift in automated research-level proof discovery without human intervention.
-
AWS Announces General Availability of DevOps Agent for Automated Incident Investigation
AWS has announced the general availability of DevOps Agent, a generative AI–powered assistant designed to help developers and operators troubleshoot issues, analyze deployments, and automate operational tasks across AWS environments.
-
Meta Reports 4x Higher Bug Detection with Just-in-Time Testing
Meta introduces Just-in-Time (JiT) testing, a dynamic approach that generates tests during code review instead of relying on static test suites. The system improves bug detection by ~4x in AI-assisted development using LLMs, mutation testing, and intent-aware workflows like Dodgy Diff. It reflects a shift toward change-aware, AI-driven software testing in agentic development environments.
-
CNCF Warns Kubernetes Alone Is Not Enough to Secure LLM Workloads
A new blog from the Cloud Native Computing Foundation highlights a critical gap in how organizations are deploying large language models (LLMs) on Kubernetes: while Kubernetes excels at orchestrating and isolating workloads, it does not inherently understand or control the behavior of AI systems, creating a fundamentally different and more complex threat model.
-
Anthropic Introduces Agent-Based Code Review for Claude Code
Anthropic has introduced a new Code Review feature for Claude Code, adding an agent-based pull request review system that analyzes code changes using multiple AI reviewers.
-
AWS Launches Agent Registry in Preview to Govern AI Agent Sprawl across Enterprises
AWS released Agent Registry in preview as part of Amazon Bedrock AgentCore, providing a centralized catalog for discovering, governing, and reusing AI agents, tools, and MCP servers across organizations. The registry indexes agents regardless of where they run and supports both MCP and A2A protocols natively. Microsoft, Google Cloud, and the ACP Registry offer competing solutions.
-
Google Opens Gemma 4 Under Apache 2.0 with Multimodal and Agentic Capabilities
Google has announced the release of Gemma 4, a series of open-weight AI models, including variants with 2B, 4B, 26B, and 31B parameters, under the Apache 2.0 license. Key features include enhanced video and image processing, audio input on smaller models, and extended context windows up to 256K tokens.
-
Cloudflare Launches Code Mode MCP Server to Optimize Token Usage for AI Agents
Cloudflare has launched a new Model Context Protocol (MCP) server powered by Code Mode, enabling AI agents to interact with large APIs with minimal token usage. The server reduces context footprint across 2,500+ endpoints, improves multi-API orchestration, and provides a secure, code-centric execution environment for LLM agents.
-
Cursor 3 Introduces Agent-First Interface, Moving beyond the IDE Model
Anysphere released Cursor 3, a redesigned interface built from scratch that shifts the primary model from file editing to managing parallel coding agents. The new workspace supports local-to-cloud agent handoff, multi-repo parallel execution, and a plugin marketplace. Community reaction has been divided, with developers questioning cost overhead and the move away from Cursor's IDE-first identity.
-
Google’s TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware
Google Research unveiled TurboQuant, a novel quantization algorithm that compresses large language models’ Key-Value caches by up to 6x. With 3.5-bit compression, near-zero accuracy loss, and no retraining needed, it allows developers to run massive context windows on significantly more modest hardware than previously required. Early community benchmarks confirm significant efficiency gains.
-
Claude Code Used to Find Remotely Exploitable Linux Kernel Vulnerability Hidden for 23 Years
Anthropic researcher Nicholas Carlini used Claude Code to find a remotely exploitable heap buffer overflow in the Linux kernel's NFS driver, undiscovered for 23 years. Five kernel vulnerabilities have been confirmed so far. Linux kernel maintainers report that AI bug reports have recently shifted from slop to legitimate findings, with security lists now receiving 5-10 valid reports daily.
-
Anthropic Paper Examines Behavioral Impact of Emotion-Like Mechanisms in LLMs
A recent paper from Anthropic examines how large language models internally represent concepts related to emotions and how these representations influence behavior. The work is part of the company’s interpretability research and focuses on analyzing internal activations in Claude Sonnet 4.5 to understand the mechanisms behind model responses better.
-
Google Released Gemma 4 with a Focus on Local-First, On-Device AI Inference
With the release of Gemma 4, Google aims to enable local, agentic AI for Android development through a family of models designed to support the entire software lifecycle, from coding to production.
-
Lyft Scales Global Localization Using AI and Human-in-the-Loop Review
Lyft has implemented an AI-driven localization system to accelerate translations of its app and web content. Using a dual-path pipeline with large language models and human review, the system processes most content in minutes, improves international release speed, ensures brand consistency, and handles complex cases like regional idioms and legal messaging efficiently.
-
Anthropic Releases Claude Mythos Preview with Cybersecurity Capabilities but Withholds Public Access
Anthropic has introduced Claude Mythos Preview, its most advanced AI model, improving significantly in reasoning, coding, and cybersecurity. Unlike previous releases, it will not be publicly available. Access is limited to a consortium of tech companies through Project Glasswing. Internal tests revealed the model's ability to discover critical security flaws effectively.