InfoQ Homepage Monitoring Content on InfoQ
-
Grafana Cloud Kubernetes Monitoring with Machine Learning Predictions
Managing cloud costs can be challenging as Kubernetes fleets scale. To address this issue, Grafana Cloud has introduced a cost-monitoring feature within Kubernetes Monitoring. In particular, Grafana Cloud’s Kubernetes Monitoring now offers ML predictions for CPU and memory usage.
-
eBPF Kubernetes Security Tool Tetragon Improves Performance and Stability
Isovalent has announced the 1.0 release of Cilium Tetragon, their eBPF-based Kubernetes security observability and runtime enforcement tool. Policies and filters can be applied directly via eBPF to monitor process execution, privilege escalations, and file and network activity.
-
Monzo Employs Targeted Traffic Shedding against Stampeding Herd Effect from the Mobile App
Monzo developed a solution for shedding traffic in case its platform comes under intense and unexpected load that could lead to an outage. Traffic spikes can be generated by the mobile app and triggered by push notifications or other bursts in user activity. The solution can reduce the read traffic by almost 50% with 90% overall accuracy without noticeable customer impact.
-
Contentsquare Uses Microservices and Apache Kafka for Notification Delivery
Contentsquare needed notification functionality for many use cases within its platform. The company created a generic solution spanning multiple services as part of its microservice architecture. During the implementation, the developers had to improve observability and overcome some scalability challenges.
-
Grafana Introduces ML Tool Sift to Improve Incident Response
Grafana Labs has introduced "Sift," a feature for Grafana Cloud designed to enhance incident response management (IRM) by automating system checks and expediting issue resolution. Sift automates various aspects of incident investigation. Sift provides valuable insights into potential issues within Kubernetes environments, helping engineers focus on resolving incidents.
-
Amazon Introduces Live Tail in CloudWatch Logs for Real-Time Exploration of Logs
Amazon recently announced CloudWatch Logs Live Tail, an option to analyze logs in near real-time. Currently only available in the AWS console, the interactive log analytics feature helps developers detect and debug application anomalies.
-
Grafana Adds Service Accounts and Improves Debugging Experience
Grafana Labs has released version 9.5 of Grafana including improvements to Grafana Alerting, service accounts, and improvements to the dashboards. Support bundles were also released providing a simpler way to gather and share debugging information about the Grafana stack. AWS has announced support for Grafana 9.4 within their Amazon Managed Grafana service.
-
New CloudWatch Metrics for AWS Lambda Asynchronous Invocations
AWS recently added three new Amazon CloudWatch metrics for AWS Lambda: AsyncEventsReceived, AsyncEventAge, and AsyncEventsDropped, to monitor the performance of asynchronous event processing.
-
Azure Announces Native New Relic Service for Full-Stack Observability
Azure recently announced a native New Relic service for full-stack observability. The performance monitoring service allows monitoring and troubleshooting of cloud applications in real-time, providing metrics, traces, and logs.
-
Microsoft’s Fully-Managed Azure Load Testing Service Now Generally Available
Microsoft recently announced the general availability of Azure Load Testing, a fully-managed load-testing service allowing customers to test the resiliency of their applications regardless of where they are hosted.
-
Log Analytics Feature in Cloud Logging Now Generally Available
Google recently made its Cloud Logging Log Analytics feature generally available (GA), allowing users to search, aggregate, and transform all log data types, including application, network, and audit logs.
-
Prometheus Adds Long Term Support Model and Improved Remote Write Mode
Prometheus, the open-source monitoring tool, has added a number of new features including a reduced functionality remote write mode. Additional improvements include a new HTTP service discovery mechanism, native histogram support, additional integrations for Alertmanager, and a new long-term support model.
-
AWS Lambda Telemetry API Provides Enhanced Observability Data
AWS has released the AWS Lambda Telemetry API, a new way for extensions to receive enhanced function telemetry from the Lambda service. The new API simplifies collecting traces, logs, and custom and enhanced metrics from Lambda functions. Along with several example extensions, there are several extensions available from third parties including Datadog, Dynatrace, Serverless, and Sumo Logic.
-
Can MTTR Be an Effective Business Metric?
In a recent blog post, Sidu Ponnappa shared how MTTR should be a key business metric to measure engineering efficiency. Ponnappa notes that only tracking uptime provides no goals to target for improvements. In a recent talk at SREcon22, Courtney Nash, senior research analyst at Verica, shared that MTTR can misrepresent what is actually happening during incidents and can be an unreliable metric.
-
New Grafana Releases Tighten Integration between Metrics and Tracing
Grafana Labs have recently released two new minor versions of their multi-platform open source analytics and interactive visualization web application. The release of version 9.1 back in August was followed by 9.2 this week. These two new versions bring a variety of improvements on their major milestone 9.0 release, and tightens the integration between metrics and tracing.