InfoQ Homepage Monitoring Content on InfoQ

News

RSS Feed

Newer Older

DevOps

Q&A with the Creator of Checkless, a Low-Cost, Simple Site Monitoring Tool

Steve Elliott wanted a simple, cheap way to monitor uptime for his websites. He found most off-the-shelf tooling to either be too complex or too costly. This lead him to build Checkless, a serverless tool that can monitor sites for uptime via ping-based checks and depending on your usage, can potentially be free to use.

Matt Campbell
on Sep 19, 2018
AI, ML & Data Engineering

Confluent Platform 5.0 Supports LDAP Authorization and MQTT Proxy for IoT Integration

Confluent Platform 5.0, the enterprise streaming platform built on Apache Kafka, supports LDAP authorization, Kafka topic inspection, and Confluent MQTT Proxy for Internet of Things (IoT) integration.

Srini Penchikala
on Sep 17, 2018
DevOps

Pinterest Switches from OpenTSDB to Their Own Time Series Database

The Pinterest engineering team has used OpenTSDB for storing and querying metrics since 2014. Recently, they developed and switched to their own time series database called Goku to mitigate various performance issues in OpenTSDB caused by a growth in the amount of metrics data.

Hrishikesh Barua
on Sep 16, 2018
DevOps

Auth0's Move to a Single-Cloud Architecture on AWS

Auth0, a provider of authentication, authorization and single sign on services, moved their infrastructure from multiple cloud providers (AWS, Azure and Google Cloud) to just AWS. An increasing dependency on AWS services necessitated this, and today their systems are spread across four AWS regions, with services replicated across zones.

Hrishikesh Barua
on Aug 25, 2018
DevOps

Prometheus Monitoring Platform "Graduates" from the Cloud Native Computing Foundation (CNCF)

On August 9th, the Cloud Native Computing Foundation (CNCF) announced open source monitoring toolkit, Prometheus, has graduated from its incubation status. In order to achieve this rating, projects must demonstrate growth, documentation, organized governance processes, commitment to community sustainability and inclusivity.

Kent Weare
on Aug 19, 2018
DevOps

Uber Open Sources Its Large Scale Metrics Platform M3

Uber’s engineering team released its metrics platform M3 as open source which it has been using internally for some years. The platform was built to replace its Graphite based system, and provides cluster management, aggregation, collection, storage management, a distributed time series database (TSDB) and a query engine with its own query language M3QL.

Hrishikesh Barua
on Aug 18, 2018
DevOps

How Coinbase Handled Scaling Challenges on Their Cryptocurrency Trading Platform

Coinbase, a digital currency exchange, faced scaling challenges on their platform during the 2017 cryptocurrency boom. The engineering team focused on upgrading and optimizing MongoDB, traffic segregation for hotspots to resolve them, and building capture and replay tools to prepare for future surges.

Hrishikesh Barua
on Aug 12, 2018
Architecture & Design

O11ycon Discusses Benefits and Challenges of Observability

The first o11ycon provides a comprehensive look at the emerging concept of observability in software and systems which allow people to understand if things are working as expected, and to diagnose problems and identify solutions.

Dylan Schiemann
on Aug 09, 2018
DevOps

Plaid.com’s Monitoring System for 9600+ Integrations

Plaid.com has integrations with over 9600 financial institutions, and their monitoring challenges arise from the heterogeneous nature of these integrations and as well as their large number. They rebuilt their monitoring system on Kinesis, Prometheus, Alertmanager and Grafana to solve the challenges of scalability and low latency.

Hrishikesh Barua
on Aug 01, 2018
DevOps

How SendGrid Scales Its Email Delivery Systems

SendGrid, a cloud based email service, has seen its backend architecture evolve from a small Postfix installation to a system hosted on their own data-centers as well as on the public cloud. Rewriting of services in Go, a gradual move to AWS, and a distributed Ceph-based queue allows the team to hand over 40 billion emails per month.

Hrishikesh Barua
on Jul 28, 2018
DevOps

Bloomberg’s Standardization and Scaling of Its Monitoring Systems

One of the outcomes of Bloomberg’s adoption of SRE practices across its development teams is the monitoring system, backed by the Cassandra-based Metrictank time-series database, that they put in place.

Hrishikesh Barua
on Jul 21, 2018
Cloud

AWS Config Gains Cross-Account, Cross-Region Data Aggregation

Amazon Web Services (AWS) recently added the capability to aggregate compliance data produced by AWS Config rules across multiple accounts and/or regions to enable centralized auditing and governance of AWS resources. A new aggregated dashboard view displays non-compliant rules across the organization. Users can then drill down to view details about resources that are violating any rules.

Steffen Opel
on Jun 30, 2018
DevOps

Understanding Production with DevOps Archeology

Lee Fox spoke at Continuous Lifecycle London about tools and methods to help make sense of today’s complex systems and infrastructure; he calls it DevOps archeology.

Manuel Pais
on Jun 14, 2018
DevOps

Thanos - a Scalable Prometheus with Unlimited Storage

The Improbable engineering team open sourced Thanos, a set of components that adds high availability to Prometheus installations by cross-cluster federation, unlimited storage and global querying across clusters.

Hrishikesh Barua
on Jun 09, 2018
DevOps

Google's Stackdriver Monitoring Announces Better Support for Kubernetes Deployments

At the recently concluded KubeCon, Google announced the beta release of Stackdriver monitoring for Kubernetes. The key features include central visibility of Kubernetes-orchestrated container metrics and logs along with other metrics in the existing Stackdriver dashboard, and better Prometheus support.

Hrishikesh Barua
on May 19, 2018

Newer News

Older News

InfoQ Software Architects' Newsletter

News