InfoQ Homepage Articles
-
From Alert Fatigue to Agent-Assisted Intelligent Observability
As systems grow, observability becomes harder to maintain and incidents harder to diagnose. Agentic observability layers AI on existing tools, starting in read-only mode to detect anomalies and summarize issues. Over time, agents add context, correlate signals, and automate low-risk tasks. This approach frees engineers to focus on analysis and judgment.
-
Working with Code Assistants: The Skeleton Architecture
Prevent AI-generated tech debt with Skeleton Architecture. This approach separates human-governed infrastructure (Skeleton) from AI-generated logic (Tissue) using Vertical Slices and Dependency Inversion. By enforcing security and flow control in rigid base classes, you constrain the AI to safe boundaries, enabling high velocity without compromising system integrity.
-
Why Most Machine Learning Projects Fail to Reach Production
In this article, the author diagnoses common failures in ML initiatives, including weak problem framing and the persistent prototype-to-production gap. The piece provides practical, experience-based guidance on setting clear business goals, treating data as a product, and aligning cross-functional teams for reliable, production-ready ML delivery.
-
Autonomous Big Data Optimization: Multi-Agent Reinforcement Learning to Achieve Self-Tuning Apache Spark
This article introduces a reinforcement learning (RL) approach grounded in Apache Spark that enables distributed computing systems to learn optimal configurations autonomously, much like an apprentice engineer who learns by doing. The author also implements a lightweight agent as a driver-side component that uses RL to choose configuration settings before a job runs.
-
Engineering Speed at Scale — Architectural Lessons from Sub-100-ms APIs
Sub‑100-ms APIs emerge from disciplined architecture using latency budgets, minimized hops, async fan‑out, layered caching, circuit breakers, and strong observability. But long‑term speed depends on culture, with teams owning p99, monitoring drift, managing thread pools, and treating performance as a shared, continuous responsibility.
-
One Cache to Rule Them All: Handling Responses and In-Flight Requests with Durable Objects
Traditional caching fails to stop "thundering herds" where multiple clients trigger the same work during a miss. This article proposes using Cloudflare Durable Objects to treat in-flight work and finished results as two states of one cache entry. By routing to a single owner, systems eliminate redundant tasks. This pattern replaces complex locks with simple promises, simplifying the system design.
-
The Friction Fix: Change What Matters
Friction is the invisible current that sinks every transformation. Friction isn’t one thing – it’s systemic. Relationships produce friction: between the people, teams and technology. The fix isn’t Kubernetes, the Cloud or AI. The fix is changing our patterns of thinking, communicating, and organizing.
-
Virtual Panel - AI in the Trenches: How Developers Are Rewriting the Software Process
This virtual panel brings together engineers, architects, and technical leaders to explore how AI is changing the landscape of software development. Practitioners share their insights on successes and failures when AI is incorporated into daily workflows, emphasizing the significance of context, validation, and cultural adaptation in making AI a sustainable element of modern engineering practices.
-
Article Series: AI-Assisted Development: Real World Patterns, Pitfalls, and Production Readiness
In this series, we examine what happens after the proof of concept and how AI becomes part of the software delivery pipeline. As AI transitions from proof of concept to production, teams are discovering that the challenge extends beyond model performance to include architecture, process, and accountability. This transition is redefining what constitutes good software engineering.
-
Preventing Data Exfiltration: a Practical Implementation of VPC Service Controls at Enterprise Scale in Google Cloud Platform
Implementing VPC Service Controls is more about people and process than technology. Organizations must conduct extensive upfront discovery, use phased rollouts to avoid breaking production systems, and design VPC Service Controls that enable rather than block work. Success requires automation, clear exception processes, tracking both security and business metrics, and continuous improvement.
-
Platform-as-a-Product: Declarative Infrastructure for Developer Velocity
Declarative infrastructure config hides complexity, enabling developers to focus on application code. Unified YAML per service allows early cost validation, while independent CI with centralized CD balances team autonomy and deployment consistency. This standardized approach scales across organizations, making infrastructure invisible and operations automatic.
-
Spec Driven Development: When Architecture Becomes Executable
Spec-Driven Development inverts traditional architecture by making specifications executable and authoritative. It transforms declared intent into validated code through AI generation and provides architectural determinism. It eliminates drift through continuous enforcement, but demands new engineering discipline in schema design and contract-first reasoning.