InfoQ Homepage DevOps Content on InfoQ
-
Discord Rebuilds Database Operations Around Automation to Manage ScyllaDB at Massive Scale
Discord has detailed how it rebuilt its database operations around a new internal orchestration framework called the Scylla Control Plane (SCP), enabling its small infrastructure team to automate large-scale ScyllaDB cluster management tasks that previously took days of manual work.
-
Cloudflare Completes Its Agent Infrastructure Stack with Browser Run Rebuild and Six-Layer Platform
Cloudflare rebuilt Browser Run on its own Containers platform, delivering 4x higher concurrency and 50% faster response times. The upgrade completes a six-layer agent infrastructure stack: compute (Dynamic Workers + Sandboxes), orchestration (Dynamic Workflows), memory (Agent Memory), browsing (Browser Run), and commerce (Stripe Projects).
-
Bintrail: MySQL Time-Travel Queries Using Indexed Binlogs
Bintrail is a recently introduced layer that brings point-in-time queries and row-history lookups to MySQL, the only major relational database lacking native temporal querying. Using indexed binlogs behind ProxySQL and without modifying MySQL or application code, Bintrail supports querying data as of a past timestamp and reviewing change history, primarily for recovery and audit scenarios.
-
OpenTofu 1.12: the Feature Terraform Never Shipped
The OpenTofu community released version 1.12.0 on May 14, 2026. This update isn’t a complete rewrite, but it does resolve some issues that infrastructure teams have faced for a while.
-
OpenAI Outlines WebRTC Architecture for Low-Latency Voice AI at Scale
OpenAI recently outlined how it adapted WebRTC for low-latency voice AI at global scale. The new architecture replaced a conventional media termination model with a relay-transceiver design better suited to Kubernetes and cloud load balancers. It keeps WebRTC session state in a dedicated transceiver layer while using relays to reduce public UDP exposure and keep media routing close to users.
-
TanStack Details Sophisticated npm Supply Chain Attack That Compromised 42 Packages
TanStack has released a detailed postmortem describing a sophisticated supply-chain attack that compromised 42 npm packages and published 84 malicious package versions in just six minutes, exposing developers and CI/CD systems to credential theft and malware propagation.
-
Ubuntu Embraces Local AI instead of Cloud-First OS Integration
Ubuntu has outlined its AI strategy, describing it as a deliberate departure from industry trends towards cloud-centric, AI-first operating systems. Instead, the company says, Ubuntu will focus future releases on local intelligence, modular design, and strict user control.
-
Discord Reveals How a Hidden Circular Dependency Triggered Its March Voice Outage
Discord has released a detailed postmortem on its March 25, 2026, voice outage, revealing that a previously undetected circular dependency in its voice infrastructure triggered a cascading failure that disrupted voice services across the platform.
-
Benchmarking AI Agents on Kubernetes
Brandon Foley published a benchmarking study on the CNCF blog showing that AI coding agents can find and fix isolated bugs. However, they often struggle to understand system-wide impacts. This challenges the idea that improved code retrieval is the main way to enhance automated bug fixing.
-
Pinterest Engineers Eliminate CPU Zombies to Resolve Production Bottlenecks
Pinterest identified and resolved CPU starvation issues that affected machine learning training jobs on its Kubernetes-based platform, PinCompute. The engineers traced the problem to an unused Amazon ECS agent, which caused memory cgroup leaks. By disabling the agent, they stabilised performance. This case illustrates the importance of understanding system defaults for effective troubleshooting.
-
Kubernetes v1.36 Released: Security Defaults Tighten as AI Workload Support Matures
Kubernetes v1.36, released in 2026, includes 70 enhancements focused on security, AI workloads, and API scalability. Key features graduating to General Availability are User Namespaces, Mutating Admission Policies, and Fine-Grained Kubelet API Authorization. The release also addresses workload management and introduces new features for AI resource allocations.
-
Grafana's Pyroscope 2.0 Makes Continuous Profiling Practical at Scale
Grafana Labs has launched Pyroscope 2.0, a rearchitected open-source continuous profiling database. This version improves storage costs, query performance, and operational complexity. Key changes include single write paths for profiles, stateless query processing, and enhanced capabilities for profiling data. It supports the OpenTelemetry Protocol, aligning with current trends in observability.
-
AWS WorkSpaces Now Lets AI Agents Operate Legacy Desktop Applications without APIs
AWS announced that Amazon WorkSpaces can now serve as managed virtual desktops for AI agents in public preview. Agents authenticate through IAM and operate legacy applications via computer vision and input simulation without APIs. Reflex benchmarks show vision agents consume 45x more tokens than API agents.
-
GitHub Expands Secret Scanning with General Availability of MCP Server Integration
GitHub has announced the general availability of secret scanning support through its MCP Server, extending automated credential detection and remediation capabilities into AI-assisted and agent-driven development workflows.
-
Copy Fail and Dirty Frag: Linux Page-Cache Exploits Target Every Major Distribution
Two recent Linux kernel vulnerabilities have been disclosed: Copy Fail (CVE-2026-31431) on April 29, 2026, and Dirty Frag (CVE-2026-43284 and CVE-2026-43500) on May 7, 2026. Both allow local users to gain root access, affecting multiple Linux distributions. These vulnerabilities exploit flaws in the page cache via different subsystems, necessitating immediate patching by affected organizations.