InfoQ Homepage Retrieval-Augmented Generation Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Pinecone Introduces Nexus Engine for Compiling Business Context into Structured Data for AI Agents

Now generally available, Pinecone Nexus is a "knowledge engine" for AI agents that transforms enterprise data into a structured layer agents can query directly. It enables teams to ingest and curate business context once for all, making it reusable across agents and reducing token costs while improving accuracy.

Sergio De Simone
on Jul 18, 2026
Architecture & Design

How DoorDash Built an AI Shopping Assistant That Doesn’t Rely on the LLM Alone

DoorDash details the architecture behind Ask DoorDash, its AI-powered conversational shopping assistant, combining LLMs, specialized AI agents, MCP-based tooling, and an intelligence layer with persistent consumer memory and live backend data. Early results show up to 24% higher checkout conversion, 17% larger baskets, and improved intent accuracy using memory-backed sessions.

Leela Kumili
on Jul 13, 2026
Architecture & Design

Inside Target’s LLM-Based System for Semantic Matching in Marketing Forecast Pipelines

Target built a generative AI system to improve marketing campaign forecasting by retrieving and ranking similar historical campaigns. Using embeddings, vector search, and LLM ranking, it replaces rule-based workflows. Evaluation shows 75% top-1 and 100% top-3 coverage. The system reduces manual effort, improves consistency, and uses feedback loops to refine retrieval using campaign outcomes.

Leela Kumili
on Jun 29, 2026
AI, ML & Data Engineering

InfoQ Online Certification Program: New AI Engineering and Organizational Architecture Cohorts

InfoQ expands its online certification portfolio with new AI Engineering and Organizational Architecture cohorts, giving senior practitioners a confidential peer group to pressure-test production AI, platform, team design, and architecture decisions.

Artenisa Chatziou
on May 26, 2026
AI, ML & Data Engineering

Six Sessions at QCon AI Boston 2026 That Take Productionizing AI Seriously

QCon AI Boston 2026 is close to selling out. Discover six sessions where speakers engage directly with the gap between AI working in a demo and AI working in production.

Artenisa Chatziou
on May 21, 2026
Cloud

Cloudflare Announces Agent Memory, a Managed Persistent Memory Service for AI Agents

Cloudflare announced Agent Memory in private beta, a managed service that extracts structured memories from AI agent conversations and retrieves them on demand using five-channel parallel retrieval with Reciprocal Rank Fusion. Shared memory profiles let teams of agents access common knowledge. Competitors include Mem0, Zep, LangMem, and Letta.

Steef-Jan Wiggers
on Apr 30, 2026
Architecture & Design

Designing Memory for AI Agents: inside Linkedin’s Cognitive Memory Agent

LinkedIn introduces Cognitive Memory Agent (CMA), generative AI infrastructure layer enabling stateful, context-aware systems. It provides persistent memory across episodic, semantic, and procedural layers, supporting multi-agent coordination, retrieval, and lifecycle management. CMA addresses LLM statelessness and enables production-grade personalization and long-term context in AI applications.

Leela Kumili
on Apr 20, 2026
Architecture & Design

Cloudflare and ETH Zurich Outline Approaches for AI-Driven Cache Optimization

Cloudflare and ETH Zurich highlight how AI-driven crawler traffic challenges traditional caching in CDNs and databases. They propose AI-aware strategies including separate cache tiers, adaptive algorithms, and pay-per-crawl models to balance performance for human users and AI services while maintaining cache efficiency and system stability.

Leela Kumili
on Apr 08, 2026
AI, ML & Data Engineering

QCon London 2026: Reliable Retrieval for Production AI Systems

At QCon London 2026, Lan Chu, AI tech lead at Rabobank, shared lessons from deploying a production AI search system used internally by more than 300 users across 10,000 documents. Her experience shows that most failures in RAG systems stem from indexing and retrieval, rather than the language model itself.

Daniel Dominguez
on Mar 17, 2026
AI, ML & Data Engineering

Scaling Human Judgment: How Dropbox Uses LLMs to Improve Labeling for RAG Systems

To improve the relevance of responses produced by Dropbox Dash, Dropbox engineers began using LLMs to augment human labelling, which plays a crucial role in identifying the documents that should be used to generate the responses. Their approach offers useful insights for any system built on retrieval-augmented generation (RAG).

Sergio De Simone
on Mar 07, 2026
Architecture & Design

How Dropbox Built a Scalable Context Engine for Enterprise Knowledge Search

Dropbox engineers have detailed how the company built the context engine behind Dropbox Dash, revealing a shift toward index-based retrieval, knowledge graph-derived context, and continuous evaluation to support enterprise AI at scale.

Matt Foster
on Feb 18, 2026
AI, ML & Data Engineering

VillageSQL Launches as an Extension-Focused MySQL Fork

A new open-source project, VillageSQL, has been introduced as a tracking fork of MySQL aimed at expanding extensibility and addressing feature gaps increasingly relevant to AI and agent-based workloads.

Robert Krzaczyński
on Feb 13, 2026
AI, ML & Data Engineering

MongoDB Introduces Embedding and Reranking API on Atlas

MongoDB has recently announced the public preview of its Embedding and Reranking API on MongoDB Atlas. The new API gives developers direct access to Voyage AI’s search models within the managed cloud database, enabling them to create features such as semantic search and AI-powered assistants within a single integrated environment, with consolidated monitoring and billing.

Renato Losio
on Feb 03, 2026
Cloud

Amazon S3 Vectors Reaches GA, Introducing "Storage-First" Architecture for RAG

AWS has announced the general availability of Amazon S3 Vectors, increasing per-index capacity forty-fold to 2 billion vectors. By natively integrating vector search into the S3 storage engine, the service introduces a "Storage-First" architecture that decouples compute from storage, reducing total cost of ownership by up to 90% for large-scale RAG workloads.

Steef-Jan Wiggers
on Jan 02, 2026
Cloud

Microsoft Foundry Agent Service Simplifies State Management with Long-Term Memory Preview

Microsoft has launched a public preview of a managed long-term memory store for its Foundry Agent Service. The service automates the extraction, consolidation, and retrieval of user context, providing a native "state layer" that prevents intelligence decay in long-running interactions with AI agents.

Steef-Jan Wiggers
on Dec 30, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

News