InfoQ Homepage Large language models Content on InfoQ
-
Intel DeepMath Introduces a Smart Architecture to Make LLMs Better at Math
Intel has announced DeepMath, a lightweight agent built on Qwen3-Thinking that specializes in solving mathematical problems. To address common limitations of LLMs in math reasoning, DeepMath generates small Python scripts that support and enhance its problem-solving process.
-
Google’s Eight Essential Multi-Agent Design Patterns
Google recently published a guide outlining eight essential design patterns for multi-agent systems, ranging from sequential pipelines to human-in-the-loop architecture. The guide provides concrete explanations of each pattern along with sample code for Google's Agent Development Kit.
-
Microsoft Research Develops Novel Approaches to Enforce Privacy in AI Models
A team of AI researchers at Microsoft introduces two novel approaches for enforcing contextual integrity in large language models: PrivacyChecker, an open-source lightweight module that acts as a privacy shield during inference, and CI-CoT + CI-RL, an advanced training method designed to teach models to reason about privacy.
-
Swiggy Rolls out Hermes V3: from Text-to-SQL to Conversational AI
Swiggy has released Hermes V3, a GenAI-powered text-to-SQL assistant that enables employees to query data in plain English. The Slack-native system combines vector retrieval, conversational memory, agentic orchestration, and explainability to improve SQL accuracy and support multi-turn analytical queries.
-
Open-Source Agent Sandbox Enables Secure Deployment of AI Agents on Kubernetes
The Agent Sandbox is an open-source Kubernetes controller that provides a declarative API for managing a single, stateful pod with stable identity and persistent storage. It is particularly well suited for creating isolated environments to execute untrusted, LLM-generated code, as well as for running other stateful workloads.
-
Cactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full Privacy
Cactus, a Y Combinator-backed startup, enables local AI inference to mobile phones, wearables, and other low-power devices through cross-platform, energy-efficient kernels and a native runtime. It delivers sub-50ms time-to-first-token for on-device inference, eliminates network latency, and defaults to complete privacy.
-
Target Improves Add to Cart Interactions by 11 Percent with Generative AI Recommendations
Target has deployed GRAM, a GenAI-powered accessory recommendation system for the Home category, using large language models to prioritize product attributes and capture aesthetic cohesion. The system helps shoppers find compatible accessories, integrates human-in-the-loop curation, and achieved measurable improvements in engagement and conversion.
-
Toad: a Unified CLI Tool for All Your LLMs That Promises Improved UX from Existing Ones
During his sabbatical, Will McGugan, maker of Rich and Textual, frameworks for making Textual User Interfaces (TUI), put his UI skills to work to build Toad. The newly publicly released tool aims to provide a unified, "beautiful" GUI for multiple coding agents in your terminal, accessible via the same tool via the Agent Communication Protocol (ACP).
-
Neptune Combines AI‑Assisted Infrastructure as Code and Cloud Deployments
Now available in beta, Neptune is a conversational AI agent designed to act like an AI platform engineer, handling the provisioning, wiring, and configuration of the cloud services needed to run a containerized app. Neptune is both language and cloud-agnostic, with support for AWS, GCP, and Azure.
-
Meta Details GEM Ads Model Using LLM-Scale Training, Hybrid Parallelism, and Knowledge Transfer
Meta released details about its Generative Ads Model (GEM), a foundation model designed to improve ads recommendation across its platforms. The model addresses core challenges in recommendation systems (RecSys) by processing billions of daily user-ad interactions where meaningful signals such as clicks and conversions are very sparse.
-
TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM Support to Java
The TornadoVM project recently reached version 2.0, a major milestone for the open-source project that aims to provide a heterogeneous hardware runtime for Java. The project automatically accelerates Java programs on multi-core CPUs, GPUs, and FPGAs. This release is likely to be of particular interest to teams developing LLM solutions on the JVM.
-
Meta's Optimization Platform Ax 1.0 Streamlines LLM and System Optimization
Now stable, Ax is an open-source platform from Meta designed to help researchers and engineers apply machine learning to complex, resource-intensive experimentation. Over the past several years, Meta has used Ax to improve AI models, accelerate machine learning research, tune production infrastructure, and more.
-
AlphaEvolve Enters Google Cloud as an Agentic System for Algorithm Optimization
Google Cloud announced the private preview of AlphaEvolve, a Gemini-powered coding agent designed to discover and optimize algorithms for complex engineering and scientific problems. The system is now available through an early access program on Google Cloud, targeting use cases where traditional brute-force or manual optimization methods struggle due to vast search spaces.
-
Magika 1.0: Smarter, Faster File Detection with Rust and AI
Google has just released version 1.0 of Magika, a substantial rewrite of its open-source file type detection system. The new version leverages AI to support a broader range of file types and is built in Rust for maximum speed and security.
-
Replit Introduces New AI Integrations for Multi-Model Development
Replit has introduced Replit AI Integrations, a feature that lets users select third-party models directly inside the IDE and automatically generate the code needed to run inference.