InfoQ Homepage Monitoring Content on InfoQ
-
Amazon Introduces Live Tail in CloudWatch Logs for Real-Time Exploration of Logs
Amazon recently announced CloudWatch Logs Live Tail, an option to analyze logs in near real-time. Currently only available in the AWS console, the interactive log analytics feature helps developers detect and debug application anomalies.
-
Grafana Adds Service Accounts and Improves Debugging Experience
Grafana Labs has released version 9.5 of Grafana including improvements to Grafana Alerting, service accounts, and improvements to the dashboards. Support bundles were also released providing a simpler way to gather and share debugging information about the Grafana stack. AWS has announced support for Grafana 9.4 within their Amazon Managed Grafana service.
-
New CloudWatch Metrics for AWS Lambda Asynchronous Invocations
AWS recently added three new Amazon CloudWatch metrics for AWS Lambda: AsyncEventsReceived, AsyncEventAge, and AsyncEventsDropped, to monitor the performance of asynchronous event processing.
-
Azure Announces Native New Relic Service for Full-Stack Observability
Azure recently announced a native New Relic service for full-stack observability. The performance monitoring service allows monitoring and troubleshooting of cloud applications in real-time, providing metrics, traces, and logs.
-
Microsoft’s Fully-Managed Azure Load Testing Service Now Generally Available
Microsoft recently announced the general availability of Azure Load Testing, a fully-managed load-testing service allowing customers to test the resiliency of their applications regardless of where they are hosted.
-
Log Analytics Feature in Cloud Logging Now Generally Available
Google recently made its Cloud Logging Log Analytics feature generally available (GA), allowing users to search, aggregate, and transform all log data types, including application, network, and audit logs.
-
Prometheus Adds Long Term Support Model and Improved Remote Write Mode
Prometheus, the open-source monitoring tool, has added a number of new features including a reduced functionality remote write mode. Additional improvements include a new HTTP service discovery mechanism, native histogram support, additional integrations for Alertmanager, and a new long-term support model.
-
AWS Lambda Telemetry API Provides Enhanced Observability Data
AWS has released the AWS Lambda Telemetry API, a new way for extensions to receive enhanced function telemetry from the Lambda service. The new API simplifies collecting traces, logs, and custom and enhanced metrics from Lambda functions. Along with several example extensions, there are several extensions available from third parties including Datadog, Dynatrace, Serverless, and Sumo Logic.
-
Can MTTR Be an Effective Business Metric?
In a recent blog post, Sidu Ponnappa shared how MTTR should be a key business metric to measure engineering efficiency. Ponnappa notes that only tracking uptime provides no goals to target for improvements. In a recent talk at SREcon22, Courtney Nash, senior research analyst at Verica, shared that MTTR can misrepresent what is actually happening during incidents and can be an unreliable metric.
-
New Grafana Releases Tighten Integration between Metrics and Tracing
Grafana Labs have recently released two new minor versions of their multi-platform open source analytics and interactive visualization web application. The release of version 9.1 back in August was followed by 9.2 this week. These two new versions bring a variety of improvements on their major milestone 9.0 release, and tightens the integration between metrics and tracing.
-
Kubernetes Control Plane Metrics Now Available in Google Kubernetes Engine
Google has announced the general availability of Kubernetes control plane metrics in Google Kubernetes Engine (GKE). These metrics are directly integrated with Google Cloud Monitoring providing a single solution for troubleshooting issues with GKE. Integration with third-party observability tooling is also possible via the Cloud Monitoring API.
-
Grafana 9 Brings Big Improvements to Alerting and User Experience
Grafana, an open-source graphing tool, has reached its version 9 release. The key goals behind version 9 are improving the user experience, making observability and data visualization easy and accessible, and improving alerting.
-
Effectively Monitoring Your Monitoring - Miedwar Meshbesher on Using Vigilance Controls
With many open-source and paid tools available to do the job, it can be relatively straightforward to make sure that your systems are monitored properly. But, how does a team make sure that these systems are working as described, and alert the team effectively that there’s a problem with the system that is supposed to be keeping an eye on things?
-
Full-Stack Observability with Grafana and Azure Monitor
Microsoft recently introduced Azure Managed Grafana in preview, including new Grafana integrations with Azure Monitor. With Azure Managed Grafana, customers can now view their Azure monitoring data in Grafana dashboards and have new out-of-the-box Azure Monitor dashboards.
-
Service Overload Detection and Remediation at LinkedIn
LinkedIn recently published how it handles overload detection and remediation in its microservices. Its solution, Hodor, provides an adaptive solution that works out of the box with no configuration. It is a platform-agnostic mechanism to run overload detectors and load shedders inside the monitored process that samples load and sheds traffic from within the application's processing chain.