InfoQ Homepage Architecture & Design Content on InfoQ
-
Three Pillars of Platform Engineering: a Virtuous Cycle
Platform engineering succeeds when reliability and ergonomics reinforce each other rather than compete. This article explores three foundational pillars: automated reliability, developer ergonomics, and operator ergonomics. Together, they establish a virtuous cycle that strengthens system stability, reduces operational burden, and empowers teams to scale infrastructure with confidence.
-
Securing Autonomous AI Agents on Kubernetes: Trust Boundaries, Secrets, and Observability for a New Category of Cloud Workload
Autonomous AI agents break Kubernetes security assumptions with dynamic dependencies, multi-domain credentials, and unpredictable resource use. This article covers production-tested patterns: Job-based isolation, Vault for scoped short-lived credentials, a four-phase trust model from shadow mode to autonomous operation, and observability for non-deterministic reasoning cycles.
-
The DPoP Storage Paradox: Why Browser-Based Proof-of-Possession Remains an Unsolved Problem
DPoP closes a real gap in OAuth 2.0. Sender-constrained tokens are a meaningful upgrade over bearer tokens for any client that can implement them. But RFC 9449's silence on browser key storage creates the need for an architectural decision that each team must confront deliberately — there is no safe default that works everywhere.
-
When a Cloud Region Fails: Rethinking High Availability in a Geopolitically Unstable World
Sovereign fault domains are failure boundaries defined by legal, political, or physical jurisdiction rather than hardware topology. The article maps geopolitical events to known distributed-systems failure modes, argues multi-region should replace multi-AZ as the HA baseline for systems crossing jurisdictions, and outlines design patterns, chaos experiments, and an ALE model to justify the spend.
-
Redesigning Banking PDF Table Extraction: a Layered Approach with Java
PDF table extraction often looks easy until it fails in production. Real bank statements can be messy, with scanned pages, shifting layouts, merged cells, and wrapped rows that break standard Java parsers. This article shares how we redesigned the approach using stream parsing, lattice/OCR, validation, scoring, and selective ML to make extraction more reliable in real banking systems.
-
Lakehouse Tower of Babel: Handling Identifier Resolution Rules across Database Engines
Lakehouse architectures enable multiple engines to operate on shared data using open table formats such as Apache Iceberg. However, differences in SQL identifier resolution and catalog naming rules create interoperability failures. This article examines these behaviors and explains why enforcing consistent naming conventions and cross-engine validation is critical.
-
Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery
In this article, the author explores how hierarchical agentic RAG systems coordinate specialized workers through structured orchestration to improve accuracy, reliability, and explainability in complex enterprise analytics workflows. The article uses Protocol-H as a to show how deterministic routing, reflective retry, and modality-aware reasoning support safer multi-source query execution.
-
Stateful Continuation for AI Agents: Why Transport Layers Now Matter
Agent workflows make transport a first-order concern. Multi-turn, tool-heavy loops amplify overhead that is negligible in single-turn LLM use. Stateful continuation cuts overhead dramatically. Caching context server-side can reduce client-sent data by 80%+ and improve execution time by 15–29% .
-
Replacing Database Sequences at Scale without Breaking 100+ Services
The article discusses the challenges faced during a migration from a relational database to NoSQL, focusing on the importance of database sequences for unique identifiers. It outlines the development of a new sequence service using DynamoDB and a two-tier caching architecture.
-
Beyond RAG: Architecting Context-Aware AI Systems with Spring Boot
This article introduces Context-Augmented Generation (CAG) as an architectural refinement of RAG for enterprise systems. It shows how a Spring Boot-based context manager can incorporate user identity, session state, and policy constraints into AI workflows, improving traceability, consistency, and governance without altering existing retrievers or LLM infrastructure.
-
Event-Driven Patterns for Cloud-Native Banking: Lessons from What Works and What Hurts
Event-driven architecture helps banks decouple systems, scale services, and create clear activity trails. But it also introduces complexity, new failure modes, and operational challenges. Chris Tacey-Green explains where it adds value in banking systems and the practical patterns, such as inbox/outbox and stable event contracts, needed to make it reliable.
-
Architecting Autonomy at Scale: Raising Teams without Creating Dependencies
Modern engineering needs a shift from "gates" to "guardrails." Scale via decentralized architecture that treats teams like adults—building judgment through Socratic coaching, shared platforms, and automated drift detection. Move beyond bottlenecks to an interdependent model where AI governance and ADRs preserve context without killing velocity. Empower autonomy while maintaining alignment.