InfoQ Homepage Monitoring Content on InfoQ
-
Airbnb Rebuilt Alert Development After Discovering It Wasn’t a Culture Problem
Airbnb has revealed how it significantly improved its observability practices by rethinking how alerts are developed and validated, concluding that what appeared to be a "culture problem" was actually a tooling and workflow gap.
-
Google Cloud Brings Full OpenTelemetry Support to Cloud Monitoring Metrics
Google Cloud recently unveiled broad support for the OpenTelemetry Protocol (OTLP) in Cloud Monitoring, marking a step toward unifying telemetry collection across its observability stack.
-
Quesma Releases OTelBench to Evaluate OpenTelemetry Infrastructure and AI Performance
Quesma has launched OTelBench, an open-source suite to benchmark OpenTelemetry pipelines and AI-driven instrumentation. It evaluates collector performance under stress while testing how accurately LLMs handle complex SRE tasks like context propagation. Initial data shows AI agents often achieve success rates below 30%, highlighting the gap between code generation and production observability.
-
What Testers Can Do to Ensure Software Security
A secure software development life cycle means baking security into plan, design, build, test, and maintenance, rather than sprinkling it on at the end, Sara Martinez said in her talk Ensuring Software Security. Testers aren’t bug finders but early defenders, building security and quality in from the first sprint. Culture first, automation second, continuous testing and monitoring all the way.
-
Inside Uber’s Query Architecture: Simplifying Layers and Improving Observability
Uber rebuilt its Apache Pinot query architecture, replacing the Presto-based Neutrino system with a lightweight proxy called Cellar and Pinot’s Multi-Stage Engine Lite Mode. The redesign simplifies SQL execution, improves resource management, and ensures predictable performance for large-scale analytics workloads.
-
Datadog Launches Monocle, a Unified Rust-Powered Real-Time Metrics Engine
Datadog has launched Monocle, a new real-time time series storage engine written in Rust. The system unifies the company’s metrics storage infrastructure, delivering higher ingestion throughput and lower query latency while reducing operational complexity. Monocle replaces several generations of storage backends, addressing concurrency challenges and scaling limits that accumulated over time.
-
Improved Application Insights Code Optimizations Identify .NET Performance Bottlenecks Automatically
Microsoft is expanding .NET developers’ toolset with enhancements to Code Optimizations. This feature is part of Azure Monitor offering and now works with the .NET Profiler in Application Insights to automatically detect CPU, memory, and threading issues in production apps and give code‑level recommendations to fix them.
-
Honeycomb Hosted MCP Brings Observability Data into the IDE
Honeycomb has launched its hosted Model Context Protocol (MCP), giving developers real-time access to observability data inside IDEs and AI tools like GitHub Copilot. Available as a managed service on AWS Marketplace, it removes the need for self-hosting and streamlines debugging by surfacing traces, metrics, and logs without context-switching.
-
Grafana 12.1 Brings Built-in Diagnostics and Enhanced Alerting
Grafana 12.1 is here, elevating system reliability and alert management with features like Grafana Advisor for health checks, a revamped alerting interface, and trendline transformations for smarter data visualization. Enhanced dashboard interactivity and improved variable handling empower teams to scale efficiently. Experience the new era of Grafana on Cloud or self-hosted!
-
Microsoft Azure Enhances Observability with OpenTelemetry Support for Logic Apps and Functions
Microsoft has expanded OpenTelemetry support in Azure Logic Apps and Functions, enhancing observability and interoperability across platforms. This open-source framework enables seamless data generation and correlation, enhancing diagnostics beyond standard telemetry. With streamlined configuration and integration, Azure's offerings aim for standardized observability across cloud services.
-
Grafana 12 Launches with Observability as Code and Dynamic Dashboard Features
Grafana Labs have launched Grafana 12, bringing significant updates to its visualisation and dashboarding platform. Several new key features are now generally available, including Git Sync, dynamic dashboards, and improvements to Drilldown which gives code-free point-and-click insights into data, and a Cloud Migration assistant.
-
Prometheus 3.0 Brings New UI, OpenTelemetry Support and More
Version 3.0 of the popular open-source monitoring system Prometheus has been released, marking the tool's first major update in seven years. A variety of new features have been added, with improvements aimed at enhancing the user experience and streamlining workflows have been made.
-
Distributed Tracing Tool Jaeger Releases Version 2 with OpenTelemetry at the Core
Version 2 of the Jaeger project, a leading open-source distributed tracing platform, has been released. This release contains a significant architectural transformation, as it brings Jaeger and its components into the OpenTelemetry framework.
-
Stripe Rearchitects Its Observability Platform with Managed Prometheus and Grafana on AWS
Stripe replaced its observability platform, which used a third-party vendor solution, with a new architecture utilizing managed services on AWS. The company made the move due to scalability limits, reliability issues, and increasing costs while transitioning to microservices. The migration involved dual-writing metrics, translating assets, validation, and user training.
-
Leveraging eBPF for Improved Infrastructure Observability
To efficiently and effectively investigate multi-tenant system performance, Netflix has been experimenting with eBPF to instrument the Linux kernel to gather continuous, deeper insights into how processes are scheduled and detect "noisy neighbors".