InfoQ Homepage DevOps Content on InfoQ
-
PagerDuty's Kafka Outage Silences Alerts for Thousands of Companies
PagerDuty, the incident management platform used by thousands of organisations to alert them to problems on their systems, suffered a major outage itself on 28th August, 2025. In a comprehensive outage report, the company detailed the scope of the problem, the customer impact, and how it is working to prevent a recurrence.
-
Pinterest Unifies Engineering Tools with New Pinconsole Platform
Pinterest has introduced PinConsole, a unified internal developer platform (IDP) that centralizes engineering workflows. Built to address fragmented tools for deployment, monitoring, and service management, PinConsole provides a consistent layer that lets engineers focus on business logic instead of infrastructure complexity.
-
Azure Service Groups Enter Public Preview Offering New Abstraction Layer for Resource Management
Microsoft has launched Azure Service Groups in public preview, a new feature designed to simplify resource management and administration. Acting as a flexible, tenant-level container, Service Groups allow users to organize Azure resources from anywhere within their tenant without affecting RBAC or policy inheritance.
-
Honeycomb Hosted MCP Brings Observability Data into the IDE
Honeycomb has launched its hosted Model Context Protocol (MCP), giving developers real-time access to observability data inside IDEs and AI tools like GitHub Copilot. Available as a managed service on AWS Marketplace, it removes the need for self-hosting and streamlines debugging by surfacing traces, metrics, and logs without context-switching.
-
Uber Shares Strategy for Controlling Risk in Monorepo Changes That Affect 3,000+ Microservices
Uber has published details on their approach to controlling rollouts of large-scale changes across monorepos that serve thousands of microservices, addressing one of the key challenges in continuous deployment at massive scale.
-
System Initiative Launches “AI Native” Platform to Simplify Infrastructure Automation
System Initiative recently released its AI Native Infrastructure Automation platform, aiming to offer DevOps teams a new way to manage infrastructure through natural language.
-
Impulse, Airbnb’s New Framework for Context-Aware Load Testing
Airbnb has developed Impulse, an internal load testing framework to improve microservice reliability and performance. It enables distributed, large-scale testing and lets teams run self-service, context-aware load tests integrated with CI pipelines. By simulating production-like traffic, Impulse helps engineers identify bottlenecks and errors before changes reach production.
-
Google Spanner Unifies OLTP and OLAP with Columnar Engine
Google Spanner now features a columnar engine, allowing its distributed database to handle both OLTP and OLAP workloads on a single platform. This hybrid architecture eliminates the need for separate data warehouses and ETL pipelines. The engine's columnar storage and vectorized execution accelerate analytical queries up to 200x on live data, which is especially beneficial for AI applications.
-
Researcher Unearths Thousands of Leaked Secrets in GitHub’s “Oops Commits”
Security researcher Sharon Brizinov, in collaboration with Truffle Security, has conducted a sweeping investigation of GitHub's "oops commits", force-pushed or deleted commits that remain archived, and uncovered thousands of secrets left behind, including high-value tokens and admin-level credentials
-
InfoQ Dev Summit Munich 2025: Master the 'How' with Deep-Dive, Practitioner-Led Guidance
At InfoQ Dev Summit Munich (Oct 15-16), learn directly from the senior engineers building complex systems. This practitioner-led conference offers deep dives on real-world implementation patterns from software leaders at Allianz, Skyscanner, Zalando, and Delivery Hero.
-
Amazon EKS Enables Ultra-Scale AI/ML Workloads with Support for 100K Nodes per Cluster
Amazon Web Services has announced a significant breakthrough in container orchestration with Amazon Elastic Kubernetes Service (EKS) now supporting clusters with up to 100,000 nodes, a 10x increase from previous limits.
-
Advanced Autoscaling Helps Companies Reduce AWS Costs by 70%
The next generation of Kubernetes autoscaling techniques and tools is enabling organisations to make substantial cost savings in their cloud infrastructure. Svetlana Burninova recently used Karpenter to build a multi-architecture EKS cluster and managed a 70% reduction in cost whilst also improving performance.
-
GitLab Unveils Duo Agent Platform in Public Beta, Introducing Agent-Orchestrated DevSecOps
GitLab has launched the public beta of its GitLab Duo Agent Platform, an orchestration tool that enables developers to collaborate asynchronously with AI agents across the DevSecOps lifecycle.
-
Airbnb Executes Istio Upgrades at Massive Scale
Airbnb engineering has published a detailed account of how it maintains high availability during Istio upgrades across tens of thousands of pods and thousands of VMs, all without downtime.
-
AWS Launches Memory-Optimized EC2 R8i and R8i-flex Instances with Custom Intel Xeon 6 Processors
AWS has launched its eighth-generation Amazon EC2 R8i and R8i-flex instances, powered by custom Intel Xeon 6 processors. Designed for memory-intensive workloads, these instances offer up to 15% better price performance and enhanced memory throughput, making them ideal for real-time data processing and AI applications.