InfoQ Homepage Large language models Content on InfoQ

Articles

RSS Feed

Newer Older

AI, ML & Data Engineering

Article Series: AI-Assisted Development: Real World Patterns, Pitfalls, and Production Readiness

In this series, we examine what happens after the proof of concept and how AI becomes part of the software delivery pipeline. As AI transitions from proof of concept to production, teams are discovering that the challenge extends beyond model performance to include architecture, process, and accountability. This transition is redefining what constitutes good software engineering.

Arthur Casals
on Jan 21, 2026
AI, ML & Data Engineering

Agentic Terminal - How Your Terminal Comes Alive with CLI Agents

In this article author Sachin Joglekar discusses the transformation of CLI terminals becoming agentic where developers can state goals while the AI agents plan, call tools, iterate, ask for approval where needed, and execute the requests. He also explains the planning styles for three different CLI tools: Gemini, Claude, and Auto-GPT.

Sachin Joglekar
on Jan 08, 2026
AI, ML & Data Engineering

Reducing False Positives in Retrieval-Augmented Generation (RAG) Semantic Caching: a Banking Case Study

In this article, author Elakkiya Daivam discusses why Retrieval Augmented Generation (RAG) and semantic caching techniques are powerful levers for reducing false positives in AI powered applications. She shares the insights from a production-grade evaluation with 1,000 query variations tested across seven bi-encoder models.

Elakkiya Daivam
on Nov 14, 2025
AI, ML & Data Engineering

Training Data Preprocessing for Text-to-Video Models

In this article, author Aleksandr Rezanov discusses the data preparation for generative text-to-image models to accelerate work on video generation services to be used in TV series and films. He explains how data is prepared and can serve as a starting point for creating custom datasets to develop proprietary models.

Aleksandr Rezanov
on Nov 06, 2025
Java

Building a RAG Application with Spring Boot, Spring AI, MongoDB Atlas Vector Search, and OpenAI

The RAG paradigm redefines AI: it combines generative models and business data for accurate, contextualised responses. The article shows how to integrate Spring Boot, Spring AI, MongoDB Atlas and OpenAI into a powerful and flexible pipeline capable of transforming the way businesses access and create value from data, with applications ranging from finance and healthcare to customer service.

Matteo Rossi
on Oct 27, 2025
AI, ML & Data Engineering

Disaggregation in Large Language Models: the Next Evolution in AI Infrastructure

Large Language Model (LLM) inference faces a fundamental challenge: the same hardware that excels at processing input prompts struggles with generating responses, and vice versa. Disaggregated serving architectures solve this by separating these distinct computational phases, delivering throughput improvements and better resource utilization while reducing costs.

Anat Heilper
on Sep 29, 2025
AI, ML & Data Engineering

InfoQ AI, ML and Data Engineering Trends Report - 2025

This InfoQ Trends Report offers readers a comprehensive overview of emerging trends and technologies in the areas of AI, ML, and Data Engineering. This report summarizes the InfoQ editorial team’s and external guests' view on the current trends in AI and ML technologies and what to look out for in the next 12 months.

Srini Penchikala Savannah Kunovsky Anthony Alford Daniel Dominguez Vinod Goje
on Sep 24, 2025
AI, ML & Data Engineering

Effective Practices for Architecting a RAG Pipeline

Hybrid search, smart chunking, and domain-aware indexing are key to building effective RAG pipelines. Context window limits and prompt quality critically affect LLM response accuracy. This article provides lessons learned from setting up a RAG pipeline.

Glenn Engstrand
on Sep 03, 2025
DevOps

How Causal Reasoning Addresses the Limitations of LLMs in Observability

Large language models excel at converting observability telemetry into clear summaries but struggle with accurate root cause analysis in distributed systems. LLMs often hallucinate explanations and confuse symptoms with causes. This article suggests how causal reasoning models with Bayesian inference offer more reliable incident diagnosis.

Dhairya Dalal
on Sep 02, 2025
AI, ML & Data Engineering

MCP: the Universal Connector for Building Smarter, Modular AI Agents

In this article, the authors discuss Model Context Protocol (MCP), an open standard designed to connect AI agents with tools and data they need. They also talk about how MCP empowers agent development, and its adoption in leading open-source frameworks.

Sanjay Surendranath Girija Lakshit Arora Shashank Kapoor
on Aug 29, 2025
Architecture & Design

The Virtual Think Tank: Using LLMs to Gain a Multitude of Perspectives

The virtual think tank leverages LLMs to simulate diverse stakeholder and expert perspectives, enabling architects to explore trade-offs, challenge biases, and refine decisions. By prompting personas of real industry experts, the method fosters rich, contextual debates—offering a scalable, low-cost alternative to a traditional think tank.

Avraham Poupko
on Aug 28, 2025
Development

Effective Practices for Coding with a Chat-Based AI

In this article, we explore how AI agents are reshaping software development and the impact they have on a developer’s workflow. We introduce a practical approach to staying in control while working with these tools by adopting key best practices from the discipline of software architecture, including defining an implementation plan, splitting tasks, and so on.

Enrico Piccinin
on Jul 04, 2025

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles