InfoQ Homepage Monitoring Tools Content on InfoQ
-
Grafana Labs Announces Updates to Its Grafana Cloud with a Free Tier, New Pricing and Features
Grafana Cloud is a fully-managed observability platform from Grafana Labs for applications and infrastructure. The company recently announced a new version for Grafana Cloud, including a free tier version, a different pricing structure, and several significant new features such as enhanced alerting and synthetic monitoring.
-
AWS Introduces Amazon Managed Service for Grafana and Amazon Managed Service for Prometheus
In one of the latest announcements of re:Invent 2020, AWS introduced the preview of Amazon Managed Service for Grafana, a managed Grafana that automatically scales compute and database infrastructure, with automated version updates and security patching. AWS also introduced a preview for Amazon Managed Service for Prometheus.
-
AWS Announces Gateway Load Balancer
AWS Gateway Load Balancer is a new fully-managed network gateway and load balancer. The service is tailored to deploy, scale and manage third-party virtual appliances such as firewalls, intrusion detection, prevention systems and deep packet inspection systems in the cloud.
-
Amazon Cloudwatch Dashboards Supports Sharing
AWS recently introduced the ability to share Amazon CloudWatch Dashboards with users who do not have access to the AWS account. This feature opens up new use cases for dashboards, including sharing metrics and information on big screens, or embedding real-time information in public pages.
-
Netflix Presents Telltale, an Application Health Monitoring Tool
The Netflix Engineering team recently blogged about Telltale, a monitoring and alerting tool that utilizes a variety of data sources to learn the typical health of an application. Telltale shows only the relevant data from application. There's also information about important events, such as nearby deployments and regional traffic evacuations.
-
Brenda - an Artificial Intelligence Team Member
Brenda uses artificial intelligence with machine learning to monitor the infrastructure, do quality assurance checks and support troubleshooting, handle alerts and communicate critical issues, and apply auto-healing. Sree Rama Murthy Pakkala and Collin Mendons from Swisscom will talk about an AI/ML framework named Brenda, who helps their teams to increase quality at Swiss Testing Day 2020.
-
Periskop: SoundCloud's Exception Monitoring Service
SoundCloud's engineering team wrote about their exception monitoring software called Periskop, which collects and aggregates exceptions across servers and reports to a central server for analysis.
-
Logz.io Survey Finds Tool Sprawl and Complex Architecture Key Challenges for Observability
Logz.io released their annual survey of the DevOps industry with the spotlight this year on observability. The key findings include that DevOps and observability tool sprawl is becoming an issue and complex architectures present the key challenge in implementing an observability solution. In the next year, they predict greater investment in observability with a focus on distributed tracing.
-
Amazon Announces AWS Firelens – a New Way to Manage Container Logs
Recently, Amazon announced a new log aggregation service called AWS Firelens. The service unifies log filtering and routing across all AWS container services including Amazon ECS, Amazon EKS, and AWS Fargate.
-
Full Stack Monitoring of JVM Applications, Using Micrometer
Clint Checketts, core committer of Micrometer Project, recently spoke at SpringOne Platform 2019 conference about Micrometer monitoring and alerting framework.
-
Twitter Open Sources Its Telemetry Tool Rezolus for Detection of Short-Lived Anomalies
Twitter Engineering open sourced their telemetry tool called Rezolus, which can detect anomalies in system performance metrics by sampling them at a higher rate.
-
Microsoft Releases a Preview of the Integration of Prometheus with Azure Monitor for Containers
Recently Microsoft announced the integration of Prometheus, a popular open-source metric monitoring solution and part of Cloud Native Compute Foundation, with Azure Monitor for containers. This integration is currently available in a preview stage for testing.
-
Athena: Automated Build Health Monitoring at Dropbox Engineering
Dropbox’s engineering team runs ~35,000 builds and millions of automated tests, many of which can fail either due to bad commits or due to environmental conditions. The team created a build monitoring system to minimize the manual intervention necessary to detect and quarantine flaky tests, and notify code authors.
-
Expo: Real Time A/B Testing and Monitoring with Spark Streaming and Kafka at Walmart Labs
The WalmartLabs engineering team developed a real time A/B testing tool called Expo that collects and analyzes user engagement metrics. It uses Spark Structured Streaming to process the incoming data and stores the metrics in KairosDB.
-
Scaling Graphite at Booking.com
Booking.com's engineering team scaled their Graphite deployment from a small cluster to one that handles millions of metrics per second. Along the way, they modified and optimized Graphite's core components - the carbon-relay and carbon-cache, and the rendering API.