InfoQ Homepage DevOps Content on InfoQ
-
Inside Google’s System for Coordinated A/B Testing across its Global Service Fleet
Google has shared details of its fleet wide large scale A/B experimentation system designed to standardize experiment assignment, exposure logging, and configuration propagation across distributed services. The approach enables consistent measurement across products, reduces experiment conflicts, and improves reliability of data driven decision making at scale.
-
OpenTelemetry Launches “Blueprints” Initiative to Simplify Enterprise Observability Adoption
OpenTelemetry has introduced a new "Blueprints" initiative aimed at reducing the growing complexity of deploying and operating observability systems at scale.
-
BadHost Vulnerability Exposes AI Agents, Evaluators, and LLM Gateways
BadHost is a high-severity authentication bypass vulnerability in the widely used Python web framework Starlette, with 325 million weekly downloads. The flaw allows attackers to use malformed HTTP Host headers to bypass path-based access controls and access sensitive AI agent infrastructure, among other systems.
-
A Trailing Slash Bypassed AWS API Gateway Authorization
A security researcher found that adding a trailing slash to AWS HTTP API paths bypassed Lambda authorizer authentication entirely, enabling unauthenticated wire transfers at a fintech. The root cause is a path normalization mismatch between HTTP API's greedy route matching and its authorization layer. The same vulnerability class appeared in gRPC-Go via CVE-2026-33186.
-
Arm Open-Sources Metis, an AI Security Framework Outperforming Traditional SAST Tools
Arm has open-sourced Metis, an agentic AI security framework designed to autonomously uncover complex software vulnerabilities. Unlike traditional pattern-based tools, Metis applies semantic reasoning to analyze cross-component dependencies and provides clear, natural language explanations for its findings.
-
Google Cloud Suspends Railway's Production Account, Causing Eight-Hour Platform-Wide Outage
Google Cloud's automated systems suspended Railway's production account without notice, triggering an eight-hour platform-wide outage affecting 3 million users. The cascade took down workloads across all providers including AWS and bare metal because Railway's control plane was hosted on GCP. Railway is demoting GCP to backup-only status.
-
AI-Assisted Migration Tool Helps Teams Move from ingress-nginx to Higress in Minutes
The Cloud Native Computing Foundation has highlighted a new AI-assisted migration approach that enabled engineers to migrate 60 ingress-nginx resources to Higress in roughly 30 minutes, demonstrating how artificial intelligence is increasingly being applied to modernize Kubernetes networking and gateway infrastructure.
-
GitHub Slashes Agent Workflow Token Spend up to 62% with Daily Audits and MCP Pruning
GitHub reports cutting token costs in agentic CI workflows by up to 62% by pruning unused MCP tools, swapping some MCP calls for gh CLI, and running daily “auditor” and “optimizer” agents. A token-usage.jsonl artefact and an Effective Tokens metric help track spend across models and spot regressions.
-
Microsoft Announces Azure Linux 4.0, Its First General-Purpose Server Linux Distribution
Microsoft announced Azure Linux 4.0 and Azure Container Linux at Open Source Summit. Azure Linux 4.0 is a Fedora-based general-purpose server distribution for Azure VMs, the first time Microsoft has offered a supported Linux beyond container hosting. Azure Container Linux is an immutable container-optimized host built on Flatcar.
-
Azure Logic Apps Adds Sandboxed Code Interpreters to Agent Workflows
Microsoft added sandboxed code interpreters to Azure Logic Apps, enabling agents within integration workflows to generate and execute Python, JavaScript, C#, and PowerShell in Hyper-V isolated sessions. Architects get full control over model selection per workflow. The capability positions Logic Apps as an agent platform for integration alongside Foundry and Copilot Studio.
-
Platform Engineering Labs Expands formae with Kubernetes Support, Native Helm Integration
Platform Engineering Labs has announced a major update to its open-source Infrastructure-as-Code platform, formae, introducing full Kubernetes support, native Helm integration, direct .tfvars compatibility, and a new public plugin hub aimed at simplifying cloud-native infrastructure management
-
Discord Rebuilds Database Operations around Automation to Manage ScyllaDB at Massive Scale
Discord has detailed how it rebuilt its database operations around a new internal orchestration framework called the Scylla Control Plane (SCP), enabling its small infrastructure team to automate large-scale ScyllaDB cluster management tasks that previously took days of manual work.
-
Cloudflare Completes Its Agent Infrastructure Stack with Browser Run Rebuild and Six-Layer Platform
Cloudflare rebuilt Browser Run on its own Containers platform, delivering 4x higher concurrency and 50% faster response times. The upgrade completes a six-layer agent infrastructure stack: compute (Dynamic Workers + Sandboxes), orchestration (Dynamic Workflows), memory (Agent Memory), browsing (Browser Run), and commerce (Stripe Projects).
-
Bintrail: MySQL Time-Travel Queries Using Indexed Binlogs
Bintrail is a recently introduced layer that brings point-in-time queries and row-history lookups to MySQL, the only major relational database lacking native temporal querying. Using indexed binlogs behind ProxySQL and without modifying MySQL or application code, Bintrail supports querying data as of a past timestamp and reviewing change history, primarily for recovery and audit scenarios.
-
OpenTofu 1.12: the Feature Terraform Never Shipped
The OpenTofu community released version 1.12.0 on May 14, 2026. This update isn’t a complete rewrite, but it does resolve some issues that infrastructure teams have faced for a while.