InfoQ Homepage Monitoring Content on InfoQ
-
InfoQ Live: Practical Ways to Integrate Observability into Your Distributed System Architecture
On Feb 16th, InfoQ Live, the one-day virtual event for software engineers, will explore practical ways you can use and integrate observability into your distributed system architecture.
-
AWS Launches Amazon DevOps Guru
Amazon Web Services (AWS) recently introduced Amazon DevOps Guru, one of several new machine learning-driven services. DevOps Guru detects operational issues, generates reports and notifications, and offers insights and recommendations on how to take action.
-
AWS Introduces Amazon Managed Service for Grafana and Amazon Managed Service for Prometheus
In one of the latest announcements of re:Invent 2020, AWS introduced the preview of Amazon Managed Service for Grafana, a managed Grafana that automatically scales compute and database infrastructure, with automated version updates and security patching. AWS also introduced a preview for Amazon Managed Service for Prometheus.
-
Google Releases Monitoring Query Language for Cloud Monitoring into General Availability
In a recent blog post, Google announced the general availability of Monitoring Query Language (MQL) in Cloud Monitoring.
-
AWS Announces Gateway Load Balancer
AWS Gateway Load Balancer is a new fully-managed network gateway and load balancer. The service is tailored to deploy, scale and manage third-party virtual appliances such as firewalls, intrusion detection, prevention systems and deep packet inspection systems in the cloud.
-
AWS Publishes Best Practices Guide for Operational Dashboards
AWS recently added to the Amazon Builders' Library their best practices for building dashboards for operational visibility. The document includes a detailed description of the different types of dashboards that exist at Amazon as well as a discussion of the design best practices used to create dashboards.
-
Amazon Cloudwatch Dashboards Supports Sharing
AWS recently introduced the ability to share Amazon CloudWatch Dashboards with users who do not have access to the AWS account. This feature opens up new use cases for dashboards, including sharing metrics and information on big screens, or embedding real-time information in public pages.
-
Observability Strategies for Distributed Systems - Lessons Learned at InfoQ Live
A good observability strategy makes it easy for teams to share their data, and uses data from across a distributed system to identify if business goals are being achieved. These were some of the ideas discussed during the InfoQ Live roundtable discussion on observability patterns for distributed systems, held on August 25.
-
Brenda - an Artificial Intelligence Team Member
Brenda uses artificial intelligence with machine learning to monitor the infrastructure, do quality assurance checks and support troubleshooting, handle alerts and communicate critical issues, and apply auto-healing. Sree Rama Murthy Pakkala and Collin Mendons from Swisscom will talk about an AI/ML framework named Brenda, who helps their teams to increase quality at Swiss Testing Day 2020.
-
How Netlify’s Infrastructure Team Improved Observability While Increasing Deployment Speed
Netlify's infrastructure team shared their story of how they increased their customer deployment speeds by up to 2x by optimizing their deployment algorithm and increased observability into their systems in the process.
-
Moogsoft Adds Virtual Network Operations Centre Capability
AIOps platform vendor, Moogsoft, has announced the release of Moogsoft Enterprise 8.0, featuring a capability for technology teams to build a virtual Network Operations Centre (NOC). Moogsoft Enterprise consolidates monitoring tools with the intention of helping technology teams reduce noise, prioritize incidents, reduce escalations and ensure uptime.
-
Periskop: SoundCloud's Exception Monitoring Service
SoundCloud's engineering team wrote about their exception monitoring software called Periskop, which collects and aggregates exceptions across servers and reports to a central server for analysis.
-
Grafana Labs Announces GA of Cortex v1.0 and Discusses Architectural Changes
Grafana Labs, the company behind popular open-source monitoring projects Grafana and Loki, announced the General Availability of Cortex v1.0. Cortex is a clustered Prometheus implementation that includes features such as horizontal scalability, multi-tenancy, durability, and long-term storage.
-
NGINX Releases Controller 3.0 with Major Redesign Providing Consolidated Application View
NGINX announced the release of NGINX Controller 3.0, their control-plane solution to manage the NGINX data plane. The 3.0 release sees a full redesign of Controller moving it into an "app-centric experience" that allows for interacting with the infrastructure at the application level. This includes a full configuration API, a role based self-service portal, and a built in certificate manager.
-
Logz.io Survey Finds Tool Sprawl and Complex Architecture Key Challenges for Observability
Logz.io released their annual survey of the DevOps industry with the spotlight this year on observability. The key findings include that DevOps and observability tool sprawl is becoming an issue and complex architectures present the key challenge in implementing an observability solution. In the next year, they predict greater investment in observability with a focus on distributed tracing.