InfoQ Homepage DevOps Content on InfoQ
-
Sauce Labs Launches AI Tool for Faster Test Analysis
Sauce Labs has launched Sauce AI for Insights, an AI-driven tool that accelerates test analysis by providing natural-language explanations, visual summaries and faster root cause detection. The company claims that it reduces debugging time, improves release readiness, and addresses the growing complexity of test data.
-
Groundcover Takes Aim at Datadog with Observability Migration Tool
Observability platform company Groundcover has launched a new migration tool to help organisations move their observability stacks from other vendors (such as Datadog) to its own platform. The company is claiming that organisations can migrate metrics, dashboards and monitors with full automation, and without needing any downtime nor consultants.
-
Nexla Launches Express: a Conversational Platform for AI Data Engineering
Nexla recently introduced Express, a conversational data engineering platform designed to dramatically lower the barrier for building data pipelines for AI applications.
-
Cloudflare Global Outage Traced to Internal Database Change
Cloudflare’s recent global outage, linked to a database update, caused widespread disruption and highlighted the risks of single-vendor reliance. While service was restored, the incident sparked discussions on the importance of multi-vendor strategies in tech. Cloudflare's CEO vowed to enhance system resilience, emphasizing that outages can impact even the largest providers.
-
Grafana Unveils Smarter Logs, an MCP Server, and TraceQL Upgrades in Latest Releases
Grafana Labs has published major updates across two of its core observability products: Grafana 12.3, and Grafana Tempo 2.9. The two releases have distinct improvements in monitoring, logs, and tracing for Grafana users.
-
Grafana Labs Releases Mimir 3.0 with Redesigned Architecture for Enhanced Performances
Grafana Labs has released Grafana Mimir 3.0. This is a significant advancement for the open-source, horizontally scalable time series database. The release features a new design that separates read and write operations. This change greatly boosts performance, reliability, and cost efficiency for organizations handling metrics at scale.
-
AWS Lambda Rust Support Reaches General Availability
AWS has elevated Rust support in Lambda from experimental to generally available, empowering developers to create high-performance, memory-safe serverless applications. This milestone enhances developer confidence, backed by AWS support and SLA. While it offers speed comparable to C++, challenges such as lengthy SDK compile times and increased binary sizes remain key considerations.
-
Developing and Deploying Software in a Sustainable Way
Sustainable APIs benefit most from minimalism, Jochen Joswig said at OOP Conference . Deployment should consider energy, usage, carbon intensity, hardware acquisition. Remote work, long device lifespans, and green office practices can lower emissions. Efficient CI, selective builds, smaller artefacts, and optimized assets can further reduce energy use.
-
Enhancing Reliability Using Service-Level Prioritized Load Shedding: Netflix at QCon SF 2025
At QCon San Francisco, Netflix engineers unveiled their advanced Service-Level-Prioritized Load-Shedding strategy, enhancing reliability during traffic spikes. By prioritizing high-value requests and automating management across microservices, they safeguard user experience and system stability. Key insights stress prioritization, automation, and structured load shedding for optimal resilience.
-
AWS Disruption Exposes Fragility in Critical Cloud Infrastructure
On October 20, 2025, Amazon Web Services (AWS) experienced a major outage that disrupted global internet services, affecting millions of users and thousands of companies across more than 60 countries. The incident originated in the US-EAST-1 region and was traced to a DNS resolution failure affecting the DynamoDB endpoint, which cascaded into outages across multiple dependent services.
-
Parting the Clouds: the Rise of Disaggregated Systems by Murat Demirbas at QCon SF 2025
Cloud computing is evolving through disaggregation, addressing inefficiencies of traditional architectures by decoupling compute and storage. This shift enhances scalability, fault isolation, and operational simplicity, driven by advancements in networking. As seen in cloud databases such as Amazon Aurora, embracing these principles enables true economic optimization and innovative design.
-
Cloudflare Workflows Adds Python Support for Durable AI Pipelines
Innovative Cloudflare Workflows now supports both TypeScript and Python, enabling developers to orchestrate complex applications seamlessly. With durable execution and state persistence, it simplifies the development of robust data pipelines and AI/ML models. Experience enhanced concurrency and intuitive design, making orchestration effortless for Python enthusiasts.
-
AWS Introduces Remote Build Cache in ECR to Accelerate Docker Image Builds
Amazon Web Services has announced enhancements to its CodeBuild service, allowing teams to use Amazon ECR as a remote Docker layer cache, significantly reducing image build times in CI/CD pipelines. By leveraging ECR repositories to persist and reuse build layers across runs, organisations can skip rebuilding unchanged parts of containers and accelerate delivery.
-
Race Condition in DynamoDB DNS System: Analyzing the AWS US-EAST-1 Outage
On October 19th and 20th, AWS experienced an extended outage triggered by a failure in Amazon DynamoDB that affected most services in its most popular region, Northern Virginia. The cloud provider released an analysis of the incident, sparking discussions in the community about redundancy on AWS, moving out of public cloud, and multi-region approaches.
-
Microsoft Addresses Data Residency with Private Cloud Expansion
Microsoft has strengthened its Sovereign Cloud offering to meet stringent global data-residency and control regulations, particularly in Europe. New capabilities include a commitment to EU Data Boundary, expanded in-country data processing, and enhanced Sovereign Private Cloud features.