InfoQ Homepage Development Content on InfoQ
-
The Schema Proliferation Problem in Kafka and Flink Pipelines: How to Solve It
Schema proliferation builds slowly and gets expensive fast. One schema per event type feels right until there are ten tables, union queries spanning all of them, and a single field rename touching every schema. Discriminator-based schema consolidation collapses that to two tables, turning multi-table unions into a single query, while new variants are additive and don't break existing consumers.
-
Local-First AI Inference: a Cloud Architecture Pattern for Cost-Effective Document Processing
The Local-First AI Inference pattern routes 70–80% of documents to deterministic local extraction at zero API cost, reserving Azure OpenAI calls for edge cases and flagging low-confidence results for human review. Deployed on 4,700 engineering drawing PDFs, it cut API costs by 75% and processing time by 55%, while bounding errors through a human review tier.
-
Implementing the Sidecar Pattern in Microservices-Based ASP.NET Core Applications
Today's applications require monitoring, logging, configuration, etc. Each of these concerns can be implemented as a component or a service. These cross-cutting concerns can be tightly integrated into the application. While this tight coupling ensures effective use of shared resources, an outage in any of these components can take your application down. Enter the sidecar design pattern.
-
Beyond the Benchmark: a Metrics-Driven Approach to Sustained iOS Performance on Real Devices
iOS performance engineering often defaults to a mental model where performance is a property of a component. Performance is instead an emergent behavior of the interaction between application code, device hardware, OS resource management, network conditions, and user behavior patterns over time. This article gives a direct, first-party path to capturing performance issues using Xcode Instruments.
-
From Batch to Micro-Batch Streaming: Lessons Learned the Hard Way in a Delta Index Pipeline
This article describes how a production delta-index pipeline migrated from scheduled batch to micro-batch Spark Structured Streaming. It covers why record-level streaming was rejected, how partition-based watermarks replaced fragile S3 completion markers, overlap-window correctness, and restart-as-design strategies for better predictability in object-store–based ingestion systems.
-
Securing Autonomous AI Agents on Kubernetes: Trust Boundaries, Secrets, and Observability for a New Category of Cloud Workload
Autonomous AI agents break Kubernetes security assumptions with dynamic dependencies, multi-domain credentials, and unpredictable resource use. This article covers production-tested patterns: Job-based isolation, Vault for scoped short-lived credentials, a four-phase trust model from shadow mode to autonomous operation, and observability for non-deterministic reasoning cycles.
-
The DPoP Storage Paradox: Why Browser-Based Proof-of-Possession Remains an Unsolved Problem
DPoP closes a real gap in OAuth 2.0. Sender-constrained tokens are a meaningful upgrade over bearer tokens for any client that can implement them. But RFC 9449's silence on browser key storage creates the need for an architectural decision that each team must confront deliberately — there is no safe default that works everywhere.
-
MCP in the Java World: Bringing Architectural Strategy to LLM Integrations
Discover how the Model Context Protocol (MCP) Java SDK is establishing a new architectural discipline for enterprise LLM integrations. By defining explicit contracts and leveraging MCP servers as anti-corruption layers, it ensures governance, loose coupling, and security alignment with the JVM ecosystem and existing operational practices, moving integrations beyond fragility to resilience.
-
When a Cloud Region Fails: Rethinking High Availability in a Geopolitically Unstable World
Sovereign fault domains are failure boundaries defined by legal, political, or physical jurisdiction rather than hardware topology. The article maps geopolitical events to known distributed-systems failure modes, argues multi-region should replace multi-AZ as the HA baseline for systems crossing jurisdictions, and outlines design patterns, chaos experiments, and an ALE model to justify the spend.
-
Redesigning Banking PDF Table Extraction: a Layered Approach with Java
PDF table extraction often looks easy until it fails in production. Real bank statements can be messy, with scanned pages, shifting layouts, merged cells, and wrapped rows that break standard Java parsers. This article shares how we redesigned the approach using stream parsing, lattice/OCR, validation, scoring, and selective ML to make extraction more reliable in real banking systems.
-
Building Production-Ready tRPC APIs: the TypeScript Alternative to Apollo Federation
This article details our migration from Apollo Federation to a TypeScript-based tRPC stack, which resulted in an 89% reduction in bugs and 67% faster response times. It also covers the mistakes we made, the unexpected performance gains, and an overview of the production architecture we use today to handle 2.4 million daily requests with 99.97% uptime.
-
Using AWS Lambda Extensions to Run Post-Response Telemetry Flush
At Lead Bank, synchronous telemetry flushing caused intermittent exporter stalls to become user-facing 504 gateway timeouts. By leveraging AWS Lambda's Extensions API and goroutine chaining in Go, flush work is moved off the response path, returning responses immediately while preserving full observability without telemetry loss.