BT
rss
DevOps Follow 818 Followers

OpsRamp Introduces an AIOps Inference Engine

by Helen Beal Follow 6 Followers on  Jun 17, 2018

​​​​​​​Provider of a SaaS based IT operations management platform, OpsRamp, has announced OpsRamp 5.0, a new release featuring an artificial intelligence for IT Operations (AIOps) inference engine for alerting and event correlation. The new release also includes a multi-cloud visibility dashboard.

DevOps Follow 818 Followers

What It Means to Be a Site Reliability Engineer According to a Survey from Catchpoint

by Helen Beal Follow 6 Followers on  Apr 13, 2018

Site Reliability Engineering intersects software engineering with IT Operations and is an approach created at Google in 2003 and described in detail in their 2016 book, Site Reliability Engineering, How Google Runs Production Systems. Digital experience intelligence provider, Catchpoint, surveyed 416 Site Reliability Engineers (SREs) with the goal of understanding what it means to be a SRE.

DevOps Follow 818 Followers

Monitoring Microservices at Scale at Crisp

by Hrishikesh Barua Follow 14 Followers on  Mar 24, 2018

Crisp’s engineering team shared their experience in monitoring their microservices stack. Vigil, their open sourced project in Rust, is a set of pull/push probes to collect health data with support for multiple languages, a status dashboard and integration with some external alerting tools.

DevOps Follow 818 Followers

Monitoring Distributed Task Queues at MeilleursAgents

by Hrishikesh Barua Follow 14 Followers on  Feb 18, 2018

MeilleursAgents, a website that lets property sellers list and get an estimated price of their property, shared details of how their Celery-based distributed task queue is monitored. A combination of Python, StatsD, Bucky, Graphite and Grafana form the pipeline to monitor task lifecycle and execution rates.

DevOps Follow 818 Followers

Monitoring Cloudflare's Global Network Using Prometheus

by Hrishikesh Barua Follow 14 Followers on  Oct 28, 2017

Matt Bostock’s SRECON 2017 Europe talk covers how Prometheus, a metric-based monitoring tool, is used to monitor CDN, DNS and DDoS mitigation provider CloudFlare’s globally distributed infrastructure and network.

Followers

Leveraging Data Science to Improve Monitoring

by João Miranda Follow 2 Followers on  Jun 30, 2015

At the recent devopsdays Amsterdam 2015, Patrick Roelke contended that monitoring still has lots of issues. Roelke believes that data science can help by eliminating static thresholds and coalescing information from various data sources into a single metric. The talk included a quick overview of monitoring tools that leverage data science: Kale, Bosun and AnomalyDetection.

Followers

Handling Incidents and Outages

by João Miranda Follow 2 Followers on  Jun 29, 2015 2

David Mytton, CEO at Server Density, shared with the devopsdays Amsterdam 2015 crowd how they handle incidents and outages. The process is grounded on a key set of principles: frequent public updates; exhaustive logging of the response activities; team effort and effective escalation. Server Density draws a lot of inspiration from the aviation industry, renowned for its safety procedures.

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT