InfoQ Homepage Articles

Articles

RSS Feed

Newer Older

DevOps

From Alert Fatigue to Agent-Assisted Intelligent Observability

As systems grow, observability becomes harder to maintain and incidents harder to diagnose. Agentic observability layers AI on existing tools, starting in read-only mode to detect anomalies and summarize issues. Over time, agents add context, correlate signals, and automate low-risk tasks. This approach frees engineers to focus on analysis and judgment.

Rohit Dhawan
on Feb 04, 2026
Development

Working with Code Assistants: The Skeleton Architecture

Prevent AI-generated tech debt with Skeleton Architecture. This approach separates human-governed infrastructure (Skeleton) from AI-generated logic (Tissue) using Vertical Slices and Dependency Inversion. By enforcing security and flow control in rigid base classes, you constrain the AI to safe boundaries, enabling high velocity without compromising system integrity.

Patrick Farry
on Feb 03, 2026
AI, ML & Data Engineering

Why Most Machine Learning Projects Fail to Reach Production

In this article, the author diagnoses common failures in ML initiatives, including weak problem framing and the persistent prototype-to-production gap. The piece provides practical, experience-based guidance on setting clear business goals, treating data as a product, and aligning cross-functional teams for reliable, production-ready ML delivery.

Wenjie Zi
on Feb 02, 2026
AI, ML & Data Engineering

Autonomous Big Data Optimization: Multi-Agent Reinforcement Learning to Achieve Self-Tuning Apache Spark

This article introduces a reinforcement learning (RL) approach grounded in Apache Spark that enables distributed computing systems to learn optimal configurations autonomously, much like an apprentice engineer who learns by doing. The author also implements a lightweight agent as a driver-side component that uses RL to choose configuration settings before a job runs.

Hina Gandhi
on Jan 30, 2026
Architecture & Design

Engineering Speed at Scale — Architectural Lessons from Sub-100-ms APIs

Sub‑100-ms APIs emerge from disciplined architecture using latency budgets, minimized hops, async fan‑out, layered caching, circuit breakers, and strong observability. But long‑term speed depends on culture, with teams owning p99, monitoring drift, managing thread pools, and treating performance as a shared, continuous responsibility.

Saranya Vedagiri
on Jan 29, 2026
Development

One Cache to Rule Them All: Handling Responses and In-Flight Requests with Durable Objects

Traditional caching fails to stop "thundering herds" where multiple clients trigger the same work during a miss. This article proposes using Cloudflare Durable Objects to treat in-flight work and finished results as two states of one cache entry. By routing to a single owner, systems eliminate redundant tasks. This pattern replaces complex locks with simple promises, simplifying the system design.

Gabor Koos
on Jan 28, 2026
Culture & Methods

The Friction Fix: Change What Matters

Friction is the invisible current that sinks every transformation. Friction isn’t one thing – it’s systemic. Relationships produce friction: between the people, teams and technology. The fix isn’t Kubernetes, the Cloud or AI. The fix is changing our patterns of thinking, communicating, and organizing.

Cat Morris Diana Montalion
on Jan 27, 2026
AI, ML & Data Engineering

Virtual Panel - AI in the Trenches: How Developers Are Rewriting the Software Process

This virtual panel brings together engineers, architects, and technical leaders to explore how AI is changing the landscape of software development. Practitioners share their insights on successes and failures when AI is incorporated into daily workflows, emphasizing the significance of context, validation, and cultural adaptation in making AI a sustainable element of modern engineering practices.

Arthur Casals Mariia Bulycheva May Walter Phil Calçado Andreas Kollegger
on Jan 26, 2026
AI, ML & Data Engineering

Article Series: AI-Assisted Development: Real World Patterns, Pitfalls, and Production Readiness

In this series, we examine what happens after the proof of concept and how AI becomes part of the software delivery pipeline. As AI transitions from proof of concept to production, teams are discovering that the challenge extends beyond model performance to include architecture, process, and accountability. This transition is redefining what constitutes good software engineering.

Arthur Casals
on Jan 21, 2026
DevOps

Preventing Data Exfiltration: a Practical Implementation of VPC Service Controls at Enterprise Scale in Google Cloud Platform

Implementing VPC Service Controls is more about people and process than technology. Organizations must conduct extensive upfront discovery, use phased rollouts to avoid breaking production systems, and design VPC Service Controls that enable rather than block work. Success requires automation, clear exception processes, tracking both security and business metrics, and continuous improvement.

Shijin Nair
on Jan 19, 2026
DevOps

Platform-as-a-Product: Declarative Infrastructure for Developer Velocity

Declarative infrastructure config hides complexity, enabling developers to focus on application code. Unified YAML per service allows early cost validation, while independent CI with centralized CD balances team autonomy and deployment consistency. This standardized approach scales across organizations, making infrastructure invisible and operations automatic.

Avinash Sabat
on Jan 14, 2026
Architecture & Design

Spec Driven Development: When Architecture Becomes Executable

Spec-Driven Development inverts traditional architecture by making specifications executable and authoritative. It transforms declared intent into validated code through AI generation and provides architectural determinism. It eliminates drift through continuous enforcement, but demands new engineering discipline in schema design and contract-first reasoning.

Leigh Griffin Ray Carroll
on Jan 12, 2026

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles