BT
DevOps Follow 591 Followers

What It Means to Be a Site Reliability Engineer According to a Survey from Catchpoint

by Helen Beal Follow 2 Followers on  Apr 13, 2018

Site Reliability Engineering intersects software engineering with IT Operations and is an approach created at Google in 2003 and described in detail in their 2016 book, Site Reliability Engineering, How Google Runs Production Systems. Digital experience intelligence provider, Catchpoint, surveyed 416 Site Reliability Engineers (SREs) with the goal of understanding what it means to be a SRE.

DevOps Follow 591 Followers

Monitoring Microservices at Scale at Crisp

by Hrishikesh Barua Follow 8 Followers on  Mar 24, 2018

Crisp’s engineering team shared their experience in monitoring their microservices stack. Vigil, their open sourced project in Rust, is a set of pull/push probes to collect health data with support for multiple languages, a status dashboard and integration with some external alerting tools.

DevOps Follow 591 Followers

How Observability Impacts Testing: Q&A with Amy Phillips at QCon London

by Ben Linders Follow 18 Followers on  Mar 07, 2018

Observability gives you a picture of the system’s current health and can replace certain types of testing. For low-risk application areas you can rely on observability instead of testing, provided you have continuous delivery that provides fast feedback and allows you to release changes quickly.

DevOps Follow 591 Followers

Monitoring Distributed Task Queues at MeilleursAgents

by Hrishikesh Barua Follow 8 Followers on  Feb 18, 2018

MeilleursAgents, a website that lets property sellers list and get an estimated price of their property, shared details of how their Celery-based distributed task queue is monitored. A combination of Python, StatsD, Bucky, Graphite and Grafana form the pipeline to monitor task lifecycle and execution rates.

DevOps Follow 591 Followers

How MakeMyTrip Monitors Its Large-Scale E-Commerce Website

by Hrishikesh Barua Follow 8 Followers on  Jan 29, 2018

MakeMyTrip, an online travel company, talks about their monitoring philosophy and setup in a series of articles. The hybrid infrastructure is monitored across the stack by mostly open source tools.

DevOps Follow 591 Followers

Monitoring Cloudflare's Global Network Using Prometheus

by Hrishikesh Barua Follow 8 Followers on  Oct 28, 2017

Matt Bostock’s SRECON 2017 Europe talk covers how Prometheus, a metric-based monitoring tool, is used to monitor CDN, DNS and DDoS mitigation provider CloudFlare’s globally distributed infrastructure and network.

Cloud Follow 227 Followers

Amazon CloudWatch Dashboards Gains API and CloudFormation Support

by Steffen Opel Follow 2 Followers on  Sep 30, 2017

Amazon Web Services (AWS) recently added programmatic creation and manipulation of CloudWatch dashboards and widgets to support use cases such as dynamic resource lifecycle tracking and consistent cross-account dashboard maintenance.

DevOps Follow 591 Followers

NGINX Releases Microservices Platform, OpenShift Ingress Controller, and Service Mesh Preview

by Daniel Bryant Follow 552 Followers on  Sep 12, 2017

NGINX Inc has released the NGINX Application Platform which aims to be a “one stop shop” for microservice developers; a Kubernetes Ingress Controller solution for load balancing on the Red Hat OpenShift Container Platform; and an implementation of NGINX as a service proxy for the Istio service mesh control plane.

DevOps Follow 591 Followers

A Comparison of Mapping Approaches for Distributed Cloud Applications

by Hrishikesh Barua Follow 8 Followers on  Jun 29, 2017

An application map is a topology view of the components of a distributed application and the network or interprocess interactions between them. A recent article gives an overview of application mapping approaches adopted by various tools like AppDynamics, OpenTracing and Netsil.

DevOps Follow 591 Followers

Metrics Collection and Monitoring at Robinhood Engineering

by Hrishikesh Barua Follow 8 Followers on  May 23, 2017

The Robinhood server operations team published a series of articles talking about their metrics collection, monitoring and alerting infrastructure. OpenTSDB, Grafana, Kafka and Riemann form the core of the stack, with Kafka acting as a proxy layer from which the data is pushed into Riemann for stream processing of the metrics and into OpenTSDB for storage.

Cloud Follow 227 Followers

DigitalOcean Adds Monitoring and Alerting Features

by Richard Seroter Follow 4 Followers on  Apr 24, 2017

Cloud infrastructure provider DigitalOcean recently released capabilities for monitoring servers and sending alerts. While not novel, this free feature is indicative of growing industry attention paid to server and application insight.

DevOps Follow 591 Followers

Avoiding Alerts Overload from Microservices: Sarah Wells at QCon London

by Daniel Bryant Follow 552 Followers on  Apr 02, 2017

At QCon London, Sarah Wells presented “Avoiding Alerts Overload from Microservices”, and cautioned that developers and operators must fundamentally change the way they think about monitoring when building a microservice system. Key takeaways included: build a system that can be supported; focus on ‘stuff that matters’ when creating monitoring and alerts; and cultivate and improve alerts.

DevOps Follow 591 Followers

Honeycomb - A Tool for Debugging Complex Systems

by Hrishikesh Barua Follow 8 Followers on  Oct 31, 2016 1

Honeycomb is a tool for observing and correlating events in distributed systems. It provides a different approach from existing tools like Zipkin in that it moves away from the single-request-tracing model to a more free-form model of collecting and querying data across layers and dimensions.

Architecture & Design Follow 1646 Followers

Adrian Cockcroft on Analyzing Response Time Distributions for Microservices

by Daniel Bryant Follow 552 Followers on  Feb 07, 2016

At the microXchg conference, held in Berlin, Adrian Cockcroft presented “Analyzing Response Time Distributions for Microservices”. Cockcroft demonstrated how the combination of his Spigo microservice architecture simulation tool and the online Guesstimate Monte Carlo method tool can be used to visualise and experimentally simulate request response times within a complicated microservice system.

Web API Follow 209 Followers

Interview with Runscope on API Testing and Monitoring

by Jeevak Kasarkod Follow 3 Followers on  Oct 22, 2015

Runscope, an API monitoring and testing vendor, announced the general availability of Live Traffic Alerts, a real time API performance monitoring solution for live production traffic for key API transactions. InfoQ used this opportunity to speak to Runscope about their vision and the value their platform brings to its consumers.

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT