InfoQ Homepage DevOps Content on InfoQ
-
Pinterest’s CDC-Powered Ingestion Slashes Database Latency from 24 Hours to 15 Minutes
Pinterest launched a next-generation CDC-based database ingestion framework using Kafka, Flink, Spark, and Iceberg. The system reduces data availability latency from 24+ hours to 15 minutes, processes only changed records, supports incremental updates and deletions, and scales to petabyte-level data across thousands of pipelines, optimizing cost and efficiency.
-
Cilium at Ten Years: Stronger Encryption, Safer Policies, and Clearer Visibility for Large Clusters
Cilium 1.19 has been released, marking ten years of development for the eBPF-based networking and security project. There isn’t a flagship feature in this release; instead, it focuses on security hardening, tightening encryption, refining network policy behaviour, and improving scalability for large Kubernetes clusters.
-
Google Brings its Developer Documentation into the Age of AI Agents
Google has announced the public preview of the Developer Knowledge API. It comes with a Model Context Protocol (MCP) server. This gives AI development tools a simple, machine-readable way to reach Google's official developer documentation.
-
AWS Drops Patent Infringement Protection for Video Encoding Services
AWS has removed its legal protections for customers using its video transcoding and streaming services, potentially exposing them to patent infringement claims from codec rights holders. The change affects six services, including the popular file-based video processing service MediaConvert and live video encoding service MediaLive.
-
Platform Engineering Labs Expands formae with Multi-Cloud Support
Platform Engineering Labs today announced a major update to its open source Infrastructure-as-Code (IaC) platform, formae, adding beta support for Google Cloud Platform (GCP), Microsoft Azure, Oracle Cloud Infrastructure (OCI), and OVHcloud.
-
Quesma Releases OTelBench to Evaluate OpenTelemetry Infrastructure and AI Performance
Quesma has launched OTelBench, an open-source suite to benchmark OpenTelemetry pipelines and AI-driven instrumentation. It evaluates collector performance under stress while testing how accurately LLMs handle complex SRE tasks like context propagation. Initial data shows AI agents often achieve success rates below 30%, highlighting the gap between code generation and production observability.
-
AWS Enables Lambda Function Triggers from RDS for SQL Server Database Events
In a blog post, AWS recently described an event-driven pattern for Amazon RDS for SQL Server, allowing developers to trigger Lambda functions in response to database events via CloudWatch Logs and SQS.
-
OpenTelemetry Project Publishes “Demystifying OpenTelemetry” Guide to Broaden Observability Adoption
The OpenTelemetry open-source observability project recently published a comprehensive guide titled "Demystifying OpenTelemetry" aimed at helping organizations understand, adopt, and scale observability using the OpenTelemetry standard.
-
Reducing Onboarding from 48 Hours to 4: inside Amazon Key’s Event-Driven Platform
Amazon Key modernized its event platform by adopting a centralized, event-driven architecture built on Amazon EventBridge. The redesign processes millions of daily events with millisecond latency, improves schema governance, automates cross-account routing, and reduces service onboarding time from 48 hours to four, while maintaining 99.99 percent reliability.
-
Uber and OpenAI Retool Rate Limiting Systems
Uber and OpenAI are replacing static rate limits with adaptive, infrastructure-level platforms. Uber’s Global Rate Limiter utilizes probabilistic shedding to manage 80M RPS, while OpenAI’s Access Engine implements a credit waterfall to prevent user interruptions. Both architectures utilize distributed enforcement and soft controls to maintain system stability and service continuity at scale.
-
Leapwork Research Shows Why AI in Testing Still Depends on Reliability, Not Just Innovation
Leapwork recently released new research showing that while confidence in AI-driven software testing is growing rapidly, accuracy, stability, and ongoing manual effort remain decisive factors in how far teams are willing to trust automation.
-
Does AI Make the Agile Manifesto Obsolete?
Capgemini's Steve Jones argues AI agents building apps in hours have killed the Agile Manifesto, as its human-centric principles don't fit agentic SDLCs. While Forrester reports 95% still find Agile relevant, Kent Beck proposes "augmented coding" and AWS suggests "Intent Design" over sprint planning. The debate: Is Agile dead, or evolving for AI collaboration?
-
LocalStack for AWS Drops Community Edition Raising Developer Concerns
LocalStack has recently announced changes to the delivery of its AWS Cloud emulators, dropping the popular open source Community Edition, and creating a single image that requires registration. Projects that currently pull the latest community image will need to update their workflows.
-
From Paging to Postmortem: Google Cloud SREs on Using Gemini CLI for Outage Response
A recent article by Google Cloud SREs describes how they use the AI-powered Gemini CLI internally to resolve real-world outages. This approach improves reliability in critical infrastructure operations and reduces incident response time by integrating intelligent reasoning directly into the terminal-based operational tools.
-
Firestore Adds Pipeline Operations with over 100 New Query Features
Google has overhauled Firestore’s query engine, introducing "Pipeline operations" that enable complex server-side aggregations and array unnesting. The update shifts Firestore Enterprise toward an optional indexing model, allowing architects to prioritize write speed and lower costs. While it brings parity with MongoDB-style aggregations, the preview currently lacks real-time and emulator support.