"Site Reliability Engineering - How Google Runs Production Systems" is an open window into Google's experience and expertise on running some of the largest IT systems in the world. The book describes the principles that underpin the Site Reliability Engineering discipline. It also details the key practices that allow Google to grow at breakneck speed without sacrificing performance or reliability.
“This web page is slow” is a common and regular complaint about web sites, especially since web applications started replacing desktop applications. While the web brings some desirable characteristics such as global delivery, it also brings its share of challenges at the performance level.
Tests should always keep the end user view in mind. But how to test web services, which are not directly customer-facing, and in particular, how to performance test them in a meaningful way? This article outlines performance split testing as a performance test approach that is relying on real-time production traffic.
This article series explains how containers are actually being used within the enterprise.
The adoption of containers is causing a paradigm shift within the monitoring space. InfoQ recently sat down with a series of container monitoring experts and explored the associated challenges.
The book Toolbox for the Agile Coach - Visualization Examples by Jimmy Janlén can be used by agile software development teams to visualize and improve their collaboration and communication.
ticketea is a large online ticket selling platform in Spain. This article describes their growing pains and how DevOps and an API-based distributed architecture allowed them to cope with growth. 1
Len Bass on the motivation for "DevOps: A Software Architect's Perspective", what does looking at DevOps from an architectural perspective mean, DevOps education, microservices and more. 1
The book Real World Kanban by Mattias Skarin provides four case studies where kanban is used to visualize, provide insight and improve product development.
This series explores some of the patterns of behavior of healthy organizations through testimonies from their practitioners and through analysis by consultants in the field.
We take a look at Etsy's blameless postmortems, both in terms of philosophy, process and practical measures/guidance to avoid blame and better prepare for the next outage. Learning is the key outcome.
If you are building or designing your next monitoring system, take a look at this short list of habits exhibited by the most successful monitoring systems in the world today. 1