InfoQ Homepage DevOps Content on InfoQ
-
From Central Control to Team Autonomy: Rethinking Infrastructure Delivery
Adidas engineers describe shifting from a centralized Infrastructure-as-Code model to a decentralized one. Five teams autonomously deployed over 81 new infrastructure stacks in two months, using layered IaC modules, automated pipelines, and shared frameworks. The redesign illustrates how to scale infrastructure delivery while maintaining governance at scale.
-
Google Cloud Brings Full OpenTelemetry Support to Cloud Monitoring Metrics
Google Cloud recently unveiled broad support for the OpenTelemetry Protocol (OTLP) in Cloud Monitoring, marking a step toward unifying telemetry collection across its observability stack.
-
AWS Launches Agent Plugins to Automate Cloud Deployment
AWS launched Agent Plugins for AWS, providing AI coding agents with specialized deployment skills. The initial deploy-on-aws plugin transforms workflows by accepting commands like "deploy to AWS" and generating complete pipelines with architecture recommendations, cost estimates, and infrastructure code. Supported in Claude Code and Cursor, AWS claims 10-minute deployments versus hours manually.
-
Google Enhances Node Pool Auto-Creation Speed for GKE Clusters
Google Cloud has optimised GKE's node pool auto-creation, significantly cutting "Time to Ready" for massive clusters. By improving control plane communication and request batching, GKE now provisions resources faster, rivalling tools like Karpenter. The update enhances scaling reliability and stability for high-volume AI and batch workloads, automatically rolling out across supported versions.
-
Argo CD 3.3 Brings Safer GitOps Deletions and Smoother Day‑to‑Day Operations
The application deployment and lifecycle management tool Argo CD has reached a new milestone with the release of version 3.3, extending the capabilities of the popular GitOps continuous delivery tool while addressing several long-standing pain points for operators.
-
Kubernetes Introduces Node Readiness Controller to Improve Pod Scheduling Reliability
The Kubernetes project recently announced a new core controller called the Node Readiness Controller, designed to enhance scheduling reliability and cluster health by making the API server’s view of node readiness more accurate.
-
Pinterest’s CDC-Powered Ingestion Slashes Database Latency from 24 Hours to 15 Minutes
Pinterest launched a next-generation CDC-based database ingestion framework using Kafka, Flink, Spark, and Iceberg. The system reduces data availability latency from 24+ hours to 15 minutes, processes only changed records, supports incremental updates and deletions, and scales to petabyte-level data across thousands of pipelines, optimizing cost and efficiency.
-
Cilium at Ten Years: Stronger Encryption, Safer Policies, and Clearer Visibility for Large Clusters
Cilium 1.19 has been released, marking ten years of development for the eBPF-based networking and security project. There isn’t a flagship feature in this release; instead, it focuses on security hardening, tightening encryption, refining network policy behaviour, and improving scalability for large Kubernetes clusters.
-
Google Brings its Developer Documentation into the Age of AI Agents
Google has announced the public preview of the Developer Knowledge API. It comes with a Model Context Protocol (MCP) server. This gives AI development tools a simple, machine-readable way to reach Google's official developer documentation.
-
AWS Drops Patent Infringement Protection for Video Encoding Services
AWS has removed its legal protections for customers using its video transcoding and streaming services, potentially exposing them to patent infringement claims from codec rights holders. The change affects six services, including the popular file-based video processing service MediaConvert and live video encoding service MediaLive.
-
Platform Engineering Labs Expands formae with Multi-Cloud Support
Platform Engineering Labs today announced a major update to its open source Infrastructure-as-Code (IaC) platform, formae, adding beta support for Google Cloud Platform (GCP), Microsoft Azure, Oracle Cloud Infrastructure (OCI), and OVHcloud.
-
Quesma Releases OTelBench to Evaluate OpenTelemetry Infrastructure and AI Performance
Quesma has launched OTelBench, an open-source suite to benchmark OpenTelemetry pipelines and AI-driven instrumentation. It evaluates collector performance under stress while testing how accurately LLMs handle complex SRE tasks like context propagation. Initial data shows AI agents often achieve success rates below 30%, highlighting the gap between code generation and production observability.
-
AWS Enables Lambda Function Triggers from RDS for SQL Server Database Events
In a blog post, AWS recently described an event-driven pattern for Amazon RDS for SQL Server, allowing developers to trigger Lambda functions in response to database events via CloudWatch Logs and SQS.
-
OpenTelemetry Project Publishes “Demystifying OpenTelemetry” Guide to Broaden Observability Adoption
The OpenTelemetry open-source observability project recently published a comprehensive guide titled "Demystifying OpenTelemetry" aimed at helping organizations understand, adopt, and scale observability using the OpenTelemetry standard.
-
Reducing Onboarding from 48 Hours to 4: inside Amazon Key’s Event-Driven Platform
Amazon Key modernized its event platform by adopting a centralized, event-driven architecture built on Amazon EventBridge. The redesign processes millions of daily events with millisecond latency, improves schema governance, automates cross-account routing, and reduces service onboarding time from 48 hours to four, while maintaining 99.99 percent reliability.