InfoQ Homepage DevOps Content on InfoQ
-
Cloudflare Optimizes Edge Stack for High-Core CPUs instead of Large Cache
Cloudflare recently introduced its Gen 13 servers, marking a shift in how its network handles traffic. Instead of relying on large CPU caches for speed, the company redesigned its software to leverage many more processor cores working in parallel in its latest AMD-based servers.
-
Yelp Achieves Zero-Downtime Upgrade of over 1,000 Cassandra Nodes
Yelp has completed a large-scale upgrade of its Apache Cassandra infrastructure, spanning more than 1,000 nodes, without any service downtime, offering a blueprint for managing stateful systems at scale.
-
HashiCorp Vault 2.0 Marks Shift to IBM Lifecycle with New Identity Federation
HashiCorp has released Vault 2.0, moving to the IBM versioning and support model following its acquisition. The update introduces Workload Identity Federation for secret syncing without static credentials, SCIM 2.0 provisioning, and performance gains in the storage engine. It also prioritises identity-based security and certificate automation while removing legacy architectural components.
-
Grafana Rearchitects Loki with Kafka and Ships a CLI to Bring Observability into Coding Agent
At GrafanaCON 2026 in Barcelona, Grafana Labs announced Grafana 13 with the new Loki Kafka-backed architecture at the ingestion layer and the AI Observability in Grafana Cloud to monitor and evaluate AI systems in real time. In particular, the new CLI called GCX was announced, designed to surface Grafana Cloud data inside agentic development environments.
-
Dropbox Collaborates with GitHub to Reduce Monorepo Size from 87GB to 20GB
Dropbox reduced its backend monorepo from 87GB to 20GB by optimizing Git delta compression in collaboration with GitHub. The changes improved clone times, CI performance, and developer velocity, highlighting how repository storage inefficiencies can impact large-scale engineering workflows.
-
Cloudflare Sandboxes Reach General Availability, Giving AI Agents Persistent Isolated Environments
Cloudflare has released Sandboxes and Containers into general availability, providing persistent isolated Linux environments for AI agent workloads. New capabilities include secure credential injection via egress proxy, PTY terminal support, persistent code interpreters, filesystem watching, and snapshot-based session recovery. Active CPU pricing charges only for used cycles.
-
Anthropic Introduces Managed Agents to Simplify AI Agent Deployment
Anthropic introduces Managed Agents on Claude, a managed execution layer for agent-based workflows. It separates agent logic from runtime concerns like orchestration, sandboxing, state management, and credentials. The system supports long-running multi-step workflows with external tools, error recovery, and session continuity via a meta-harness architecture.
-
GitHub Acknowledges Recent Outages, Cites Scaling Challenges and Architectural Weaknesses
GitHub has publicly addressed a series of recent availability and performance issues that disrupted services across its platform, attributing the incidents to rapid growth, architectural coupling, and limitations in handling system load.
-
AWS Announces General Availability of DevOps Agent for Automated Incident Investigation
AWS has announced the general availability of DevOps Agent, a generative AI–powered assistant designed to help developers and operators troubleshoot issues, analyze deployments, and automate operational tasks across AWS environments.
-
Pulumi Adds Full Bun Runtime Support
Pulumi has announced that Bun is now a fully supported runtime for Pulumi, going beyond its previous role as merely a package manager option. With the new release of Pulumi 3.227.0, developers can set runtime: bun in their Pulumi.yaml and have Bun execute their entire infrastructure program, with no Node.js installation required.
-
CNCF Warns Kubernetes Alone Is Not Enough to Secure LLM Workloads
A new blog from the Cloud Native Computing Foundation highlights a critical gap in how organizations are deploying large language models (LLMs) on Kubernetes: while Kubernetes excels at orchestrating and isolating workloads, it does not inherently understand or control the behavior of AI systems, creating a fundamentally different and more complex threat model.
-
AWS Launches Agent Registry in Preview to Govern AI Agent Sprawl across Enterprises
AWS released Agent Registry in preview as part of Amazon Bedrock AgentCore, providing a centralized catalog for discovering, governing, and reusing AI agents, tools, and MCP servers across organizations. The registry indexes agents regardless of where they run and supports both MCP and A2A protocols natively. Microsoft, Google Cloud, and the ACP Registry offer competing solutions.
-
AWS Introduces S3 Files, Bringing File System Access to S3 Buckets
AWS recently introduced S3 Files, which lets users mount an Amazon S3 bucket and access its data through a standard file system interface. Applications can read and write files using standard file operations, while the system automatically translates them into S3 requests, allowing compute services to work directly with data stored in S3.
-
OpenTelemetry Declarative Configuration Reaches Stability Milestone
The OpenTelemetry project has announced that key portions of its declarative configuration specification have reached stable status. The observability framework is a vendor-neutral and language-agnostic way to configure telemetry collection.
-
New Rowhammer Attacks on NVIDIA GPUs Enable Full System Takeover
Security researchers have demonstrated a new class of Rowhammer attacks targeting NVIDIA GPUs that can escalate from memory corruption to full system compromise, marking a significant shift in hardware-level security risks.