InfoQ Homepage Cloud Computing Content on InfoQ
-
Building a Secure MCP Server on AWS for a Million-Company B2B Platform
We wanted to expose a B2B intelligence platform built on more than one million company profiles to an LLM client through an MCP server so a user can ask “find SaaS companies in Germany with 50-200 employees” and receive results through the LLM client. The engineering problem was: how do you make that workflow useful without creating an unsafe bridge between an LLM and production data?
-
Local-First AI Inference: a Cloud Architecture Pattern for Cost-Effective Document Processing
The Local-First AI Inference pattern routes 70–80% of documents to deterministic local extraction at zero API cost, reserving Azure OpenAI calls for edge cases and flagging low-confidence results for human review. Deployed on 4,700 engineering drawing PDFs, it cut API costs by 75% and processing time by 55%, while bounding errors through a human review tier.
-
Securing Autonomous AI Agents on Kubernetes: Trust Boundaries, Secrets, and Observability for a New Category of Cloud Workload
Autonomous AI agents break Kubernetes security assumptions with dynamic dependencies, multi-domain credentials, and unpredictable resource use. This article covers production-tested patterns: Job-based isolation, Vault for scoped short-lived credentials, a four-phase trust model from shadow mode to autonomous operation, and observability for non-deterministic reasoning cycles.
-
The DPoP Storage Paradox: Why Browser-Based Proof-of-Possession Remains an Unsolved Problem
DPoP closes a real gap in OAuth 2.0. Sender-constrained tokens are a meaningful upgrade over bearer tokens for any client that can implement them. But RFC 9449's silence on browser key storage creates the need for an architectural decision that each team must confront deliberately — there is no safe default that works everywhere.
-
Using AWS Lambda Extensions to Run Post-Response Telemetry Flush
At Lead Bank, synchronous telemetry flushing caused intermittent exporter stalls to become user-facing 504 gateway timeouts. By leveraging AWS Lambda's Extensions API and goroutine chaining in Go, flush work is moved off the response path, returning responses immediately while preserving full observability without telemetry loss.
-
Beyond One-Click: Designing an Enterprise-Grade Observability Extension for Docker
Docker Extensions boost developer speed but create a "visibility gap" by isolating telemetry. To meet enterprise needs, extensions must act as bridges to centralized platforms. This article details how to use OpenTelemetry, policy-as-code, and encryption to build secure pipelines. Learn to balance developer productivity with the governance required for scalable, compliant observability.
-
Replacing Database Sequences at Scale without Breaking 100+ Services
The article discusses the challenges faced during a migration from a relational database to NoSQL, focusing on the importance of database sequences for unique identifiers. It outlines the development of a new sequence service using DynamoDB and a two-tier caching architecture.
-
Event-Driven Patterns for Cloud-Native Banking: Lessons from What Works and What Hurts
Event-driven architecture helps banks decouple systems, scale services, and create clear activity trails. But it also introduces complexity, new failure modes, and operational challenges. Chris Tacey-Green explains where it adds value in banking systems and the practical patterns, such as inbox/outbox and stable event contracts, needed to make it reliable.
-
Configuration as a Control Plane: Designing for Safety and Reliability at Scale
Configuration has evolved from static deployment files into a live control plane that directly shapes system behavior. The evolution of configuration management highlights why misconfigurations can trigger large outages and how hyperscalers deploy changes safely using staged rollouts, validation, blast radius limits, and automated rollback at scale.
-
Building a Least-Privilege AI Agent Gateway for Infrastructure Automation with MCP, OPA, and Ephemeral Runners
This article presents a least-privilege AI Agent Gateway that places clear controls between AI agents and infrastructure. Agents do not access infrastructure APIs directly. Instead, every request is validated, authorized using policy as code with Open Policy Agent (OPA), and executed in short-lived, isolated environments, with built-in observability using OpenTelemetry.
-
Proactive Autoscaling for Edge Applications in Kubernetes
Kubernetes often reacts too late when traffic suddenly increases at the edge. A proactive scaling approach that considers response time, spare CPU capacity, and container startup delays can add or remove instances more smoothly, prevent sudden spikes, and keep performance stable on systems with limited resources.
-
Preventing Data Exfiltration: a Practical Implementation of VPC Service Controls at Enterprise Scale in Google Cloud Platform
Implementing VPC Service Controls is more about people and process than technology. Organizations must conduct extensive upfront discovery, use phased rollouts to avoid breaking production systems, and design VPC Service Controls that enable rather than block work. Success requires automation, clear exception processes, tracking both security and business metrics, and continuous improvement.