InfoQ Homepage Articles

Articles

RSS Feed

Newer Older

AI, ML & Data Engineering

From Batch to Micro-Batch Streaming: Lessons Learned the Hard Way in a Delta Index Pipeline

This article describes how a production delta-index pipeline migrated from scheduled batch to micro-batch Spark Structured Streaming. It covers why record-level streaming was rejected, how partition-based watermarks replaced fragile S3 completion markers, overlap-window correctness, and restart-as-design strategies for better predictability in object-store–based ingestion systems.

Parveen Saini
on May 04, 2026
Cloud

Securing Autonomous AI Agents on Kubernetes: Trust Boundaries, Secrets, and Observability for a New Category of Cloud Workload

Autonomous AI agents break Kubernetes security assumptions with dynamic dependencies, multi-domain credentials, and unpredictable resource use. This article covers production-tested patterns: Job-based isolation, Vault for scoped short-lived credentials, a four-phase trust model from shadow mode to autonomous operation, and observability for non-deterministic reasoning cycles.

Nik Kale
on May 01, 2026
Web Development

The DPoP Storage Paradox: Why Browser-Based Proof-of-Possession Remains an Unsolved Problem

DPoP closes a real gap in OAuth 2.0. Sender-constrained tokens are a meaningful upgrade over bearer tokens for any client that can implement them. But RFC 9449's silence on browser key storage creates the need for an architectural decision that each team must confront deliberately — there is no safe default that works everywhere.

Dhruv Agnihotri
on Apr 30, 2026
AI, ML & Data Engineering

CodeGuardian: a Model Context Protocol Server for AI-Assisted Code Quality Analysis and Security Scanning

CodeGuardian is an MCP server that extends AI coding assistants with comprehensive code quality and security analysis capabilities. By implementing eleven specialized tools, CodeGuardian enables developers to access enterprise-grade analysis directly through their AI assistant, eliminating context-switching and reducing friction in adopting secure coding practices.

Madhvesh Kumar Deepika Singh
on Apr 28, 2026
Java

MCP in the Java World: Bringing Architectural Strategy to LLM Integrations

Discover how the Model Context Protocol (MCP) Java SDK is establishing a new architectural discipline for enterprise LLM integrations. By defining explicit contracts and leveraging MCP servers as anti-corruption layers, it ensures governance, loose coupling, and security alignment with the JVM ecosystem and existing operational practices, moving integrations beyond fragility to resilience.

Matteo Rossi
on Apr 27, 2026
AI, ML & Data Engineering

Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel

In this article, author Vignesh Durai discusses how agentic and multimodal AI systems can be engineered using Apache Camel and LangChain4j technologies. The key components in the solution include LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.

Vignesh Durai
on Apr 24, 2026
Cloud

When a Cloud Region Fails: Rethinking High Availability in a Geopolitically Unstable World

Sovereign fault domains are failure boundaries defined by legal, political, or physical jurisdiction rather than hardware topology. The article maps geopolitical events to known distributed-systems failure modes, argues multi-region should replace multi-AZ as the HA baseline for systems crossing jurisdictions, and outlines design patterns, chaos experiments, and an ALE model to justify the spend.

Rohan Vardhan
on Apr 22, 2026
Java

Redesigning Banking PDF Table Extraction: a Layered Approach with Java

PDF table extraction often looks easy until it fails in production. Real bank statements can be messy, with scanned pages, shifting layouts, merged cells, and wrapped rows that break standard Java parsers. This article shares how we redesigned the approach using stream parsing, lattice/OCR, validation, scoring, and selective ML to make extraction more reliable in real banking systems.

Mehuli Mukherjee
on Apr 21, 2026
Web Development

Building Production-Ready tRPC APIs: the TypeScript Alternative to Apollo Federation

This article details our migration from Apollo Federation to a TypeScript-based tRPC stack, which resulted in an 89% reduction in bugs and 67% faster response times. It also covers the mistakes we made, the unexpected performance gains, and an overview of the production architecture we use today to handle 2.4 million daily requests with 99.97% uptime.

Dinesh Kumar Elumalai
on Apr 20, 2026
AI, ML & Data Engineering

Lakehouse Tower of Babel: Handling Identifier Resolution Rules across Database Engines

Lakehouse architectures enable multiple engines to operate on shared data using open table formats such as Apache Iceberg. However, differences in SQL identifier resolution and catalog naming rules create interoperability failures. This article examines these behaviors and explains why enforcing consistent naming conventions and cross-engine validation is critical.

Maninder Parmar
on Apr 17, 2026
Cloud

Using AWS Lambda Extensions to Run Post-Response Telemetry Flush

At Lead Bank, synchronous telemetry flushing caused intermittent exporter stalls to become user-facing 504 gateway timeouts. By leveraging AWS Lambda's Extensions API and goroutine chaining in Go, flush work is moved off the response path, returning responses immediately while preserving full observability without telemetry loss.

Melvin Philips
on Apr 15, 2026
DevOps

Beyond One-Click: Designing an Enterprise-Grade Observability Extension for Docker

Docker Extensions boost developer speed but create a "visibility gap" by isolating telemetry. To meet enterprise needs, extensions must act as bridges to centralized platforms. This article details how to use OpenTelemetry, policy-as-code, and encryption to build secure pipelines. Learn to balance developer productivity with the governance required for scalable, compliant observability.

Pragya Keshap
on Apr 14, 2026

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles