InfoQ Homepage Artificial Intelligence Content on InfoQ
-
Local-First AI Inference: a Cloud Architecture Pattern for Cost-Effective Document Processing
The Local-First AI Inference pattern routes 70–80% of documents to deterministic local extraction at zero API cost, reserving Azure OpenAI calls for edge cases and flagging low-confidence results for human review. Deployed on 4,700 engineering drawing PDFs, it cut API costs by 75% and processing time by 55%, while bounding errors through a human review tier.
-
MCP in the Java World: Bringing Architectural Strategy to LLM Integrations
Discover how the Model Context Protocol (MCP) Java SDK is establishing a new architectural discipline for enterprise LLM integrations. By defining explicit contracts and leveraging MCP servers as anti-corruption layers, it ensures governance, loose coupling, and security alignment with the JVM ecosystem and existing operational practices, moving integrations beyond fragility to resilience.
-
Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel
In this article, author Vignesh Durai discusses how agentic and multimodal AI systems can be engineered using Apache Camel and LangChain4j technologies. The key components in the solution include LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.
-
Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery
In this article, the author explores how hierarchical agentic RAG systems coordinate specialized workers through structured orchestration to improve accuracy, reliability, and explainability in complex enterprise analytics workflows. The article uses Protocol-H as a to show how deterministic routing, reflective retry, and modality-aware reasoning support safer multi-source query execution.
-
Stateful Continuation for AI Agents: Why Transport Layers Now Matter
Agent workflows make transport a first-order concern. Multi-turn, tool-heavy loops amplify overhead that is negligible in single-turn LLM use. Stateful continuation cuts overhead dramatically. Caching context server-side can reduce client-sent data by 80%+ and improve execution time by 15–29% .
-
Beyond RAG: Architecting Context-Aware AI Systems with Spring Boot
This article introduces Context-Augmented Generation (CAG) as an architectural refinement of RAG for enterprise systems. It shows how a Spring Boot-based context manager can incorporate user identity, session state, and policy constraints into AI workflows, improving traceability, consistency, and governance without altering existing retrievers or LLM infrastructure.
-
Architecting Agentic MLOps: a Layered Protocol Strategy with A2A and MCP
In this article, the authors outline protocols for building extensible multi-agent MLOps systems. The core architecture deliberately decouples orchestration from execution, allowing teams to incrementally add capabilities via discovery and evolve operations from static pipelines toward intelligent, adaptive coordination.
-
From Prompts to Production: a Playbook for Agentic Development
In this article, author Abhishek Goswami shares a practitioner's playbook with development practices, that describes building agentic AI applications and scaling them in production. He also presents core architecture patterns for agentic application development.
-
Building LLMs in Resource-Constrained Environments: a Hands-On Perspective
In this article, the author argues that infrastructure and compute limitations can drive innovation. It demonstrates how smaller, efficient models, synthetic data generation, and disciplined engineering enable the creation of impactful LLM-based AI systems despite severe resource constraints.
-
From Alert Fatigue to Agent-Assisted Intelligent Observability
As systems grow, observability becomes harder to maintain and incidents harder to diagnose. Agentic observability layers AI on existing tools, starting in read-only mode to detect anomalies and summarize issues. Over time, agents add context, correlate signals, and automate low-risk tasks. This approach frees engineers to focus on analysis and judgment.
-
Why Most Machine Learning Projects Fail to Reach Production
In this article, the author diagnoses common failures in ML initiatives, including weak problem framing and the persistent prototype-to-production gap. The piece provides practical, experience-based guidance on setting clear business goals, treating data as a product, and aligning cross-functional teams for reliable, production-ready ML delivery.
-
Article Series: AI-Assisted Development: Real World Patterns, Pitfalls, and Production Readiness
In this series, we examine what happens after the proof of concept and how AI becomes part of the software delivery pipeline. As AI transitions from proof of concept to production, teams are discovering that the challenge extends beyond model performance to include architecture, process, and accountability. This transition is redefining what constitutes good software engineering.