InfoQ Homepage Resilience Content on InfoQ
-
Testing Resiliency at PagerDuty Without a Simian Army
Doug Barth, from PagerDuty, talked at DevOps Days London about their approach to start resiliency testing their systems without dedicating a lot of automation effort upfront. The goal was to quickly start learning about failure points and openly discuss how to fix them with only one hour per week of effort.
-
Amazon Web Services Stability and the September 13th US East 1 Outage
Amazon Web Services (AWS) suffered another outage of its US East 1 region during the morning of Friday 13th September. A number of popular applications such as Heroku, Github and CMSWire were disrupted along with many other customers in Amazon’s largest, oldest and busiest location.