InfoQ Homepage DevOps Content on InfoQ
-
Anthropic Reveals Three Infrastructure Bugs behind Claude Performance Issues
Anthropic recently published a postmortem revealing that three distinct infrastructure bugs intermittently degraded the output quality of its Claude models in recent weeks. While the company states it has now resolved those issues and is modifying its internal processes to prevent similar disruptions, the community highlights the challenges of running the service across three hardware platforms.
-
Microsoft Announces General Availability of AKS Automatic
Microsoft has released Azure Kubernetes Service (AKS) Automatic to general availability, introducing a fully-managed Kubernetes offering designed to eliminate operational overhead while maintaining the full power and flexibility of the platform.
-
Pulumi Launches Neo: an Agentic AI Platform Engineer for Multi-Cloud Infrastructure
Infrastructure automation company Pulumi has introduced what's claimed to be the first artificial intelligence-based platform engineering agent for the industry, named Neo. The tool works to resolve some of the infrastructure bottlenecks that develop as a side effect of AI tools speeding up software development.
-
DORA Report Finds AI Is an Amplifier in Software Development, But Trust Remains Low
Nearly 90% of technology professionals now use artificial intelligence in their work. But according to the 2025 DORA State of AI-assisted Software Development report, there's still a significant gap in trust between developers and the tools they increasingly rely upon. The report findings found that while AI adoption has become "nearly universal," there are still some organisational challenges.
-
Imagine Learning Highlights Linkerd’s Role in Cloud-Native Scale and Cost Savings
Innovative education technology provider Imagine Learning relies on Linkerd as the backbone of its cloud-native infrastructure, enabling rapid growth and ensuring reliability, scalability, and security. With over 80% reduction in compute needs and a 40% cut in networking costs, Linkerd offers a proven solution that enhances efficiency across diverse sectors.
-
Report Finds LLMs Not Yet Ready to Replace SREs in Incident Management
A study by ClickHouse found that large language models (LLMs) can't yet replace Site Reliability Engineers (SREs) for tasks such as finding the root causes of incidents. The study tested five leading models against real-world observability data to determine whether AI could autonomously identify production issues.
-
Kubernetes 1.34 Released with KYAML, Traffic Routing Controls, and Improved Observability
The Cloud Native Computing Foundation (CNCF) released Kubernetes 1.34, named "Of Wind & Will" (O’ WaW), last month. The release introduced features such as distributed resource allocation and production-grade tracing for the kubelet and API server.
-
AWS CDK Refactor Feature: Safe Infrastructure as Code Renaming
AWS's new Cloud Development Kit (CDK) refactor command allows engineers to safely rename and reorganize infrastructure as code without forcing a destructive rebuild. The feature, leveraging a similar AWS CloudFormation capability, automatically computes the necessary mappings to preserve resources like databases, solving a major pain point that previously led to data loss and downtime.
-
Gitpod Rebrands to Ona, Aiming to Become the AI-Powered Center of Software Development
Gitpod, known for offering browser-based cloud development environments, has rebranded as Ona, signaling a major shift in its vision from IDE-centric workflows to AI-driven software engineering.
-
Google Cloud Observability Adopts OpenTelemetry Protocol for Native Trace Ingestion
Google Cloud has announced native support for the OpenTelemetry Protocol (OTLP) in its Cloud Trace service, marking a significant step toward vendor-neutral observability infrastructure. The new capability allows developers to send trace data directly using OTLP through the telemetry.googleapis.com endpoint, eliminating the need for vendor-specific exporters and custom data transformations.
-
Microsoft Introduces Logic Apps as MCP Servers in Public Preview
Microsoft has unveiled a public preview of Azure Logic Apps (Standard) as Model Context Protocol (MCP) servers, enabling developers to build and manage AI agents easily. This new capability promotes seamless integration with diverse systems, enhancing scalability and reusability while simplifying the development process for enterprise workflows.
-
AWS Integrates LocalStack with VS-Code Toolkit to Streamline Serverless Development
AWS has announced the integration of LocalStack with the AWS Toolkit for Visual Studio Code, addressing a long-standing challenge in serverless development where developers needed to juggle multiple tools and complex configurations for local testing of event-driven applications.
-
Temporal and OpenAI Launch AI Agent Durability with Public Preview Integration
Temporal has unveiled a public preview integration with the OpenAI Agents SDK, introducing durable execution capabilities to AI agent workflows built using OpenAI's framework.
-
Open Practices for Architecture and AI Adoption
Andrea Magnorsky presented on Byte-Sized Architecture at Cloud Native Summit 2025, as a format for building shared understanding through small, recurrent workshops. Ahilan Ponnusamy and Andreas Spanner discussed the Technology Operating Model for AI adoption. Both approaches drew on the Open Practice Library for human-centred collaboration and driving architectural evolution.
-
Linux Security Tools Bypassed by io_uring Rootkit Technique, ARMO Research Reveals
Security researchers at ARMO have uncovered a significant vulnerability in Linux runtime security tools that stems from the io_uring interface, an asynchronous I/O mechanism that can completely bypass traditional system call monitoring. The research demonstrates how attackers can exploit this blind spot to operate undetected by most existing security solutions.