InfoQ Homepage Large language models Content on InfoQ
-
Why Vector Search Alone Isn't Enough: Hybrid Retrieval for RAG
In this article, author Aaditya Chauhan discusses the limitations of RAG pipelines based purely on vector search and how an internal omni-search application using Reciprocal Rank Fusion (RRF) that combines BM25 and vector results, can enhance the search solution.
-
The AI Productivity Paradox in Test Automation: Moving Beyond Structural Validation to Perception and Intent
The AI productivity paradox states that AI scales whatever abstraction it is built on. If that abstraction is structurally brittle, it scales structural brittleness. This article shows how, to build a future of reliable, AI-driven test automation, we must stop scaling DOM-centric abstractions and build a new testing paradigm grounded in perception and intent.
-
Local-First AI Inference: a Cloud Architecture Pattern for Cost-Effective Document Processing
The Local-First AI Inference pattern routes 70–80% of documents to deterministic local extraction at zero API cost, reserving Azure OpenAI calls for edge cases and flagging low-confidence results for human review. Deployed on 4,700 engineering drawing PDFs, it cut API costs by 75% and processing time by 55%, while bounding errors through a human review tier.
-
MCP in the Java World: Bringing Architectural Strategy to LLM Integrations
Discover how the Model Context Protocol (MCP) Java SDK is establishing a new architectural discipline for enterprise LLM integrations. By defining explicit contracts and leveraging MCP servers as anti-corruption layers, it ensures governance, loose coupling, and security alignment with the JVM ecosystem and existing operational practices, moving integrations beyond fragility to resilience.
-
Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel
In this article, author Vignesh Durai discusses how agentic and multimodal AI systems can be engineered using Apache Camel and LangChain4j technologies. The key components in the solution include LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.
-
Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery
In this article, the author explores how hierarchical agentic RAG systems coordinate specialized workers through structured orchestration to improve accuracy, reliability, and explainability in complex enterprise analytics workflows. The article uses Protocol-H as a to show how deterministic routing, reflective retry, and modality-aware reasoning support safer multi-source query execution.
-
Beyond RAG: Architecting Context-Aware AI Systems with Spring Boot
This article introduces Context-Augmented Generation (CAG) as an architectural refinement of RAG for enterprise systems. It shows how a Spring Boot-based context manager can incorporate user identity, session state, and policy constraints into AI workflows, improving traceability, consistency, and governance without altering existing retrievers or LLM infrastructure.
-
Building LLMs in Resource-Constrained Environments: a Hands-On Perspective
In this article, the author argues that infrastructure and compute limitations can drive innovation. It demonstrates how smaller, efficient models, synthetic data generation, and disciplined engineering enable the creation of impactful LLM-based AI systems despite severe resource constraints.
-
Article Series: AI-Assisted Development: Real World Patterns, Pitfalls, and Production Readiness
In this series, we examine what happens after the proof of concept and how AI becomes part of the software delivery pipeline. As AI transitions from proof of concept to production, teams are discovering that the challenge extends beyond model performance to include architecture, process, and accountability. This transition is redefining what constitutes good software engineering.
-
Agentic Terminal - How Your Terminal Comes Alive with CLI Agents
In this article author Sachin Joglekar discusses the transformation of CLI terminals becoming agentic where developers can state goals while the AI agents plan, call tools, iterate, ask for approval where needed, and execute the requests. He also explains the planning styles for three different CLI tools: Gemini, Claude, and Auto-GPT.
-
Reducing False Positives in Retrieval-Augmented Generation (RAG) Semantic Caching: a Banking Case Study
In this article, author Elakkiya Daivam discusses why Retrieval Augmented Generation (RAG) and semantic caching techniques are powerful levers for reducing false positives in AI powered applications. She shares the insights from a production-grade evaluation with 1,000 query variations tested across seven bi-encoder models.
-
Training Data Preprocessing for Text-to-Video Models
In this article, author Aleksandr Rezanov discusses the data preparation for generative text-to-image models to accelerate work on video generation services to be used in TV series and films. He explains how data is prepared and can serve as a starting point for creating custom datasets to develop proprietary models.