InfoQ Homepage Retrieval-Augmented Generation Content on InfoQ
-
Six Sessions at QCon AI Boston 2026 That Take Productionizing AI Seriously
QCon AI Boston 2026 is close to selling out. Discover six sessions where speakers engage directly with the gap between AI working in a demo and AI working in production.
-
Cloudflare Announces Agent Memory, a Managed Persistent Memory Service for AI Agents
Cloudflare announced Agent Memory in private beta, a managed service that extracts structured memories from AI agent conversations and retrieves them on demand using five-channel parallel retrieval with Reciprocal Rank Fusion. Shared memory profiles let teams of agents access common knowledge. Competitors include Mem0, Zep, LangMem, and Letta.
-
Designing Memory for AI Agents: inside Linkedin’s Cognitive Memory Agent
LinkedIn introduces Cognitive Memory Agent (CMA), generative AI infrastructure layer enabling stateful, context-aware systems. It provides persistent memory across episodic, semantic, and procedural layers, supporting multi-agent coordination, retrieval, and lifecycle management. CMA addresses LLM statelessness and enables production-grade personalization and long-term context in AI applications.
-
Cloudflare and ETH Zurich Outline Approaches for AI-Driven Cache Optimization
Cloudflare and ETH Zurich highlight how AI-driven crawler traffic challenges traditional caching in CDNs and databases. They propose AI-aware strategies including separate cache tiers, adaptive algorithms, and pay-per-crawl models to balance performance for human users and AI services while maintaining cache efficiency and system stability.
-
QCon London 2026: Reliable Retrieval for Production AI Systems
At QCon London 2026, Lan Chu, AI tech lead at Rabobank, shared lessons from deploying a production AI search system used internally by more than 300 users across 10,000 documents. Her experience shows that most failures in RAG systems stem from indexing and retrieval, rather than the language model itself.
-
Scaling Human Judgment: How Dropbox Uses LLMs to Improve Labeling for RAG Systems
To improve the relevance of responses produced by Dropbox Dash, Dropbox engineers began using LLMs to augment human labelling, which plays a crucial role in identifying the documents that should be used to generate the responses. Their approach offers useful insights for any system built on retrieval-augmented generation (RAG).
-
How Dropbox Built a Scalable Context Engine for Enterprise Knowledge Search
Dropbox engineers have detailed how the company built the context engine behind Dropbox Dash, revealing a shift toward index-based retrieval, knowledge graph-derived context, and continuous evaluation to support enterprise AI at scale.
-
VillageSQL Launches as an Extension-Focused MySQL Fork
A new open-source project, VillageSQL, has been introduced as a tracking fork of MySQL aimed at expanding extensibility and addressing feature gaps increasingly relevant to AI and agent-based workloads.
-
MongoDB Introduces Embedding and Reranking API on Atlas
MongoDB has recently announced the public preview of its Embedding and Reranking API on MongoDB Atlas. The new API gives developers direct access to Voyage AI’s search models within the managed cloud database, enabling them to create features such as semantic search and AI-powered assistants within a single integrated environment, with consolidated monitoring and billing.
-
Amazon S3 Vectors Reaches GA, Introducing "Storage-First" Architecture for RAG
AWS has announced the general availability of Amazon S3 Vectors, increasing per-index capacity forty-fold to 2 billion vectors. By natively integrating vector search into the S3 storage engine, the service introduces a "Storage-First" architecture that decouples compute from storage, reducing total cost of ownership by up to 90% for large-scale RAG workloads.
-
Microsoft Foundry Agent Service Simplifies State Management with Long-Term Memory Preview
Microsoft has launched a public preview of a managed long-term memory store for its Foundry Agent Service. The service automates the extraction, consolidation, and retrieval of user context, providing a native "state layer" that prevents intelligence decay in long-running interactions with AI agents.
-
QCon AI New York 2025 Schedule Published, Highlights Practical Enterprise AI
The QCon AI New York 2025 schedule is now live for its Dec 16-17 event. Focused on moving AI from PoC to production, the program offers a practical roadmap for senior engineers & tech leaders. It addresses the real-world challenges of building, scaling, and deploying reliable, enterprise-grade AI systems, helping organizations overcome the hurdles of productionizing their AI initiatives.
-
Claude Code Gains Support for Remote MCP Servers over Streamable HTTP
Anthropic has recently introduced support for connecting to remote MCP servers in Claude Code, allowing developers to integrate external tools and resources without manual local server setup.
-
Cloudflare AutoRAG Streamlines Retrieval-Augmented Generation
Cloudflare has launched a managed service for using retrieval-augmented generation in LLM-based systems. Now in beta, CloudFlare AutoRAG aims to make it easier for developers to build pipelines that integrate rich context data into LLMs.
-
UC Berkeley's Sky Computing Lab Introduces Model to Reduce AI Language Model Inference Costs
UC Berkeley's Sky Computing Lab has released Sky-T1-32B-Flash, an updated reasoning language model that addresses the common issue of AI overthinking. The model, developed through the NovaSky (Next-generation Open Vision and AI) initiative, "slashes inference costs on challenging questions by up to 57%" while maintaining accuracy across mathematics, coding, science, and general knowledge domains.