InfoQ Homepage DevOps Content on InfoQ
-
Claude Opus 4.6 Introduces Adaptive Reasoning and Context Compaction for Long-Running Agents
Anthropic’s Claude Opus 4.6 introduces "Adaptive Thinking" and a "Compaction API" to solve context rot in long-running agents. The model supports a 1M token context window with 76% multi-needle retrieval accuracy. While leading benchmarks in agentic coding, independent tests show a 49% detection rate for binary backdoors, highlighting the gap between SOTA claims and production security.
-
Running Ray at Scale on AKS
The Azure Kubernetes Service (AKS) team at Microsoft has shared guidance for running Anyscale's managed Ray service at scale. They focus on three key issues: GPU capacity limits, scattered ML storage, and problems with credential expiry.
-
From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture
Uber redesigned its MySQL fleet using a consensus-driven architecture based on MySQL Group Replication, reducing cluster failover time from minutes to seconds. By moving leader election and failure detection into the database layer, Uber improved availability, simplified external orchestration, and strengthened consistency across thousands of production clusters.
-
AI-Powered Bot Compromises GitHub Actions Workflows across Microsoft, DataDog, and CNCF Projects
AI-powered bot hackerbot-claw exploited GitHub Actions workflows across Microsoft, DataDog, and CNCF projects over 7 days using 5 attack techniques. Bot achieved RCE in 5 of 7 targets, stole GitHub token from awesome-go (140k stars), and fully compromised Aqua Security's Trivy. Campaign included first documented AI-on-AI attack where bot attempted prompt injection against Claude Code.
-
GitLab Suggests AI Can Detect Vulnerabilities But it's AI Governance That Determines Risk
Artificial intelligence is rapidly transforming how software vulnerabilities are detected, but questions about who governs the risks AI exposes, and how those risks are acted on, are becoming increasingly urgent, according to a new blog post by GitLab.
-
AWS Introduces Nested Virtualization on EC2 Instances
AWS recently announced support for nested virtual machines within virtualized EC2 instances running KVM or Hyper-V. A long-awaited feature by the community, the new option enables use cases such as app emulation and hardware simulation on supported C8i, M8i, and R8i instances.
-
Standardizing Post-Quantum IPsec: Cloudflare Adopts Hybrid ML-KEM to Replace Ciphersuite Bloat
Cloudflare has extended hybrid post-quantum encryption to IPsec and WAN traffic, standardizing its SASE stack ahead of the NIST 2030 deadline. By adopting a streamlined ML-KEM key exchange, the move addresses long-standing "ciphersuite bloat" in quantum-resistant IPsec. The update aims to neutralize "harvest now, decrypt later" threats without requiring specialized hardware upgrades.
-
CNCF Graduates Dragonfly, Marking Major Milestone for Cloud-Native Image Distribution
The Cloud Native Computing Foundation (CNCF) announced recently that Dragonfly, its open source image and file distribution system, has reached graduated status, the highest maturity level within the CNCF project lifecycle.
-
OpenAI Secures AWS Distribution for Frontier Platform in $110B Multi-Cloud Deal
OpenAI's $110B funding includes AWS as the exclusive third-party distributor for the Frontier agent platform, introducing an architectural split: Azure retains stateless API exclusivity; AWS gains stateful runtime environments via Bedrock. Deal expands the existing $38B AWS agreement by $100B and commits 2GW of Trainium capacity.
-
From Central Control to Team Autonomy: Rethinking Infrastructure Delivery
Adidas engineers describe shifting from a centralized Infrastructure-as-Code model to a decentralized one. Five teams autonomously deployed over 81 new infrastructure stacks in two months, using layered IaC modules, automated pipelines, and shared frameworks. The redesign illustrates how to scale infrastructure delivery while maintaining governance at scale.
-
Google Cloud Brings Full OpenTelemetry Support to Cloud Monitoring Metrics
Google Cloud recently unveiled broad support for the OpenTelemetry Protocol (OTLP) in Cloud Monitoring, marking a step toward unifying telemetry collection across its observability stack.
-
AWS Launches Agent Plugins to Automate Cloud Deployment
AWS launched Agent Plugins for AWS, providing AI coding agents with specialized deployment skills. The initial deploy-on-aws plugin transforms workflows by accepting commands like "deploy to AWS" and generating complete pipelines with architecture recommendations, cost estimates, and infrastructure code. Supported in Claude Code and Cursor, AWS claims 10-minute deployments versus hours manually.
-
Google Enhances Node Pool Auto-Creation Speed for GKE Clusters
Google Cloud has optimised GKE's node pool auto-creation, significantly cutting "Time to Ready" for massive clusters. By improving control plane communication and request batching, GKE now provisions resources faster, rivalling tools like Karpenter. The update enhances scaling reliability and stability for high-volume AI and batch workloads, automatically rolling out across supported versions.
-
Argo CD 3.3 Brings Safer GitOps Deletions and Smoother Day‑to‑Day Operations
The application deployment and lifecycle management tool Argo CD has reached a new milestone with the release of version 3.3, extending the capabilities of the popular GitOps continuous delivery tool while addressing several long-standing pain points for operators.
-
Kubernetes Introduces Node Readiness Controller to Improve Pod Scheduling Reliability
The Kubernetes project recently announced a new core controller called the Node Readiness Controller, designed to enhance scheduling reliability and cluster health by making the API server’s view of node readiness more accurate.