InfoQ Homepage Large language models Content on InfoQ

Articles

RSS Feed

Newer Older

AI, ML & Data Engineering

Why Vector Search Alone Isn't Enough: Hybrid Retrieval for RAG

In this article, author Aaditya Chauhan discusses the limitations of RAG pipelines based purely on vector search and how an internal omni-search application using Reciprocal Rank Fusion (RRF) that combines BM25 and vector results, can enhance the search solution.

Aaditya Chauhan
on Jun 02, 2026
Web Development

The AI Productivity Paradox in Test Automation: Moving beyond Structural Validation to Perception and Intent

The AI productivity paradox states that AI scales whatever abstraction it is built on. If that abstraction is structurally brittle, it scales structural brittleness. This article shows that to build a future of reliable, AI-driven test automation, we must stop scaling DOM-centric abstractions and build a new testing paradigm grounded in perception and intent.

Amanul Chowdhury Vinay Gummadavelli
on Jun 01, 2026
Cloud

Local-First AI Inference: a Cloud Architecture Pattern for Cost-Effective Document Processing

The Local-First AI Inference pattern routes 70–80% of documents to deterministic local extraction at zero API cost, reserving Azure OpenAI calls for edge cases and flagging low-confidence results for human review. Deployed on 4,700 engineering drawing PDFs, it cut API costs by 75% and processing time by 55%, while bounding errors through a human review tier.

Obinna Iheanachor
on May 11, 2026
Java

MCP in the Java World: Bringing Architectural Strategy to LLM Integrations

Discover how the Model Context Protocol (MCP) Java SDK is establishing a new architectural discipline for enterprise LLM integrations. By defining explicit contracts and leveraging MCP servers as anti-corruption layers, it ensures governance, loose coupling, and security alignment with the JVM ecosystem and existing operational practices, moving integrations beyond fragility to resilience.

Matteo Rossi
on Apr 27, 2026
AI, ML & Data Engineering

Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel

In this article, author Vignesh Durai discusses how agentic and multimodal AI systems can be engineered using Apache Camel and LangChain4j technologies. The key components in the solution include LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.

Vignesh Durai
on Apr 24, 2026
AI, ML & Data Engineering

Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery

In this article, the author explores how hierarchical agentic RAG systems coordinate specialized workers through structured orchestration to improve accuracy, reliability, and explainability in complex enterprise analytics workflows. The article uses Protocol-H as a to show how deterministic routing, reflective retry, and modality-aware reasoning support safer multi-source query execution.

Abhijit Ubale
on Apr 09, 2026
Java

Beyond RAG: Architecting Context-Aware AI Systems with Spring Boot

This article introduces Context-Augmented Generation (CAG) as an architectural refinement of RAG for enterprise systems. It shows how a Spring Boot-based context manager can incorporate user identity, session state, and policy constraints into AI workflows, improving traceability, consistency, and governance without altering existing retrievers or LLM infrastructure.

Syed Danish Ali
on Apr 02, 2026
AI, ML & Data Engineering

Building LLMs in Resource-Constrained Environments: a Hands-On Perspective

In this article, the author argues that infrastructure and compute limitations can drive innovation. It demonstrates how smaller, efficient models, synthetic data generation, and disciplined engineering enable the creation of impactful LLM-based AI systems despite severe resource constraints.

Olimpiu Pop
on Feb 09, 2026
AI, ML & Data Engineering

Article Series: AI-Assisted Development: Real World Patterns, Pitfalls, and Production Readiness

In this series, we examine what happens after the proof of concept and how AI becomes part of the software delivery pipeline. As AI transitions from proof of concept to production, teams are discovering that the challenge extends beyond model performance to include architecture, process, and accountability. This transition is redefining what constitutes good software engineering.

Arthur Casals
on Jan 21, 2026
AI, ML & Data Engineering

Agentic Terminal - How Your Terminal Comes Alive with CLI Agents

In this article author Sachin Joglekar discusses the transformation of CLI terminals becoming agentic where developers can state goals while the AI agents plan, call tools, iterate, ask for approval where needed, and execute the requests. He also explains the planning styles for three different CLI tools: Gemini, Claude, and Auto-GPT.

Sachin Joglekar
on Jan 08, 2026
AI, ML & Data Engineering

Reducing False Positives in Retrieval-Augmented Generation (RAG) Semantic Caching: a Banking Case Study

In this article, author Elakkiya Daivam discusses why Retrieval Augmented Generation (RAG) and semantic caching techniques are powerful levers for reducing false positives in AI powered applications. She shares the insights from a production-grade evaluation with 1,000 query variations tested across seven bi-encoder models.

Elakkiya Daivam
on Nov 14, 2025
AI, ML & Data Engineering

Training Data Preprocessing for Text-to-Video Models

In this article, author Aleksandr Rezanov discusses the data preparation for generative text-to-image models to accelerate work on video generation services to be used in TV series and films. He explains how data is prepared and can serve as a starting point for creating custom datasets to develop proprietary models.

Aleksandr Rezanov
on Nov 06, 2025

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles