InfoQ Homepage Monitoring Content on InfoQ
-
Amazon Releases Container Monitoring for Amazon ECS, EKS, and Kubernetes via CloudWatch
Recently, Amazon announced that customers can now monitor, isolate, and diagnose their containerized applications and microservices environments using Amazon CloudWatch Container Insights. Cloud Insights is a part of Amazon CloudWatch, a fully-managed monitoring and observability service in AWS targeted for DevOps engineers, developers, site reliability engineers (SREs), and IT managers.
-
Instana Pipeline Feedback for Release Performance
Application performance management service provider Instana launched Pipeline Feedback for release performance tracking and analysis. Pipeline Feedback provides automatic tracking of application releases, feedback on release performance, and integration with Jenkins.
-
Microsoft Releases a Preview of the Integration of Prometheus with Azure Monitor for Containers
Recently Microsoft announced the integration of Prometheus, a popular open-source metric monitoring solution and part of Cloud Native Compute Foundation, with Azure Monitor for containers. This integration is currently available in a preview stage for testing.
-
Vector Performance Monitoring Tool Adds eBPF, Unified Host-Container Metrics Support
Vector, the open source performance monitoring tool from Netflix, added support for eBPF based tools using a PCP daemon, a unified view of container and host metrics, and UI improvements.
-
Scaling, Incident Management and Collaboration at New York Times Engineering
The New York Times Engineering Team wrote about their approach to scaling and incident management against the backdrop of increased traffic during the November 2018 US midterm elections.
-
Testing Complex Distributed Systems at FT.com: Sarah Wells Shares Lessons Learned
The complexity in complex distributed systems isn’t in the code, it’s between the services or functions. Testing implies balancing finding problems versus delivering value, said Sarah Wells at the European Testing Conference. Testers often have the best understanding of what the system does; they have a good hypothesis about what went wrong, and are able to validate it pretty quickly.
-
Amazon Introduces AWS Cloud Map: "Service Discovery for Cloud Resources"
In a recent blog post, Amazon introduced a new service called AWS Cloud Map which discovers and tracks cloud resources. With the rise of microservice architectures, it has been increasingly difficult to manage dynamic resources in these architectures. But, using AWS Cloud Map, developers can monitor the health of databases, queues, microservices, and other cloud resources with custom names.
-
Grafana Adds Log Data Correlation to Time Series Metrics
The Grafana team announced an alpha version of Loki, their logging platform that ties in with other Grafana features like metrics query and visualization. Loki adds a new client agent promtail and serverside components for log metadata indexing and storage.
-
Inside Stack Overflow’s Monitoring Systems
Nick Craver, architecture lead at Stack Exchange, wrote about their monitoring systems in a recent article. He discussed the philosophy and motivation behind their monitoring strategy and talked about their toolset - mainly Bosun, Grafana and Opserver.
-
Scaling Observability at Uber: Building In-House Solutions, uMonitor and Neris
Uber’s infrastructure consists of thousands of microservices supporting mobile applications, infrastructure, and internal services. To provide high observability of these services, Uber’s Observability team built two in-house monitoring solutions: uMonitor for time-series metrics-based alerting, and Neris for host-level checks and metrics.
-
Q&A with the Creator of Checkless, a Low-Cost, Simple Site Monitoring Tool
Steve Elliott wanted a simple, cheap way to monitor uptime for his websites. He found most off-the-shelf tooling to either be too complex or too costly. This lead him to build Checkless, a serverless tool that can monitor sites for uptime via ping-based checks and depending on your usage, can potentially be free to use.
-
Confluent Platform 5.0 Supports LDAP Authorization and MQTT Proxy for IoT Integration
Confluent Platform 5.0, the enterprise streaming platform built on Apache Kafka, supports LDAP authorization, Kafka topic inspection, and Confluent MQTT Proxy for Internet of Things (IoT) integration.
-
Pinterest Switches from OpenTSDB to Their Own Time Series Database
The Pinterest engineering team has used OpenTSDB for storing and querying metrics since 2014. Recently, they developed and switched to their own time series database called Goku to mitigate various performance issues in OpenTSDB caused by a growth in the amount of metrics data.
-
Auth0's Move to a Single-Cloud Architecture on AWS
Auth0, a provider of authentication, authorization and single sign on services, moved their infrastructure from multiple cloud providers (AWS, Azure and Google Cloud) to just AWS. An increasing dependency on AWS services necessitated this, and today their systems are spread across four AWS regions, with services replicated across zones.
-
Prometheus Monitoring Platform "Graduates" from the Cloud Native Computing Foundation (CNCF)
On August 9th, the Cloud Native Computing Foundation (CNCF) announced open source monitoring toolkit, Prometheus, has graduated from its incubation status. In order to achieve this rating, projects must demonstrate growth, documentation, organized governance processes, commitment to community sustainability and inclusivity.