InfoQ Homepage Resilience Content on InfoQ

News

RSS Feed

Newer Older

DevOps

Amplifying Sources of Resilience: John Allspaw at QCon London

At QCon London John Allspaw presented “Amplifying Sources of Resilience: What Research Says”. Key takeaways from the talk included: that resilience is something a system does, not what a system has; creating and sustaining “adaptive capacity” within an organisation is resilient action; and learning about how people cope with surprise is the path to finding sources of resilience.

Daniel Bryant
on Apr 23, 2019
Java

Failsafe 2.0 Released with Composable Resilience Policies

Failsafe, a zero-dependency Java library for handling failures, has released version 2.0 with support for resilience policy composition and a pluggable architecture that enables custom policy service providers.

Uday Tatiraju
on Apr 11, 2019
Architecture & Design

Designing and Building a Resilient Serverless System: John Chapin at QCon London

In a presentation at QCon London 2019, John Chapin explained the basics of serverless technologies and how to architect and build a resilient serverless system. He also ran a demo of a how a globally distributed, highly available application can be built and run in multiple regions on AWS.

Jan Stenberg
on Mar 12, 2019
DevOps

Building Production-Ready Applications: Michael Kehoe Shares Lessons Learned from LinkedIn

At QCon San Francisco, Michael Kehoe presented “Building Production-Ready Applications”. Drawing on his experience with site reliability engineering (SRE), he introduced the tenets of “production-readiness” that all engineers across the organisation should focus on as: stability and reliability; scalability and performance; fault tolerance and disaster recovery; monitoring; and documentation.

Daniel Bryant
on Nov 12, 2018
DevOps

Gremlin Releases Application Level Fault Injection (ALFI) Platform for Targeted Chaos Experiments

Gremlin Inc has released their second product offering in the “Failure-as-a-Service” domain– Application-Level Fault Injection (ALFI). Building upon their initial platform that facilitated engineers in creating and running chaos experiments at the infrastructure level, ALFI enables failure injection at the application level via a native language library.

Daniel Bryant
on Oct 07, 2018
Architecture & Design

How to Achieve a Resilient Architecture

To manage systems at scale you must push your system almost to the breaking point, but still be able to recover – and embrace failures, Adrian Hornsby writes in two blog posts sharing his experiences from working with large-scale systems for more than a decade, and the patterns he has found useful.

Jan Stenberg
on Sep 13, 2018
Culture & Methods

Ben Gracewood on Learning from an Organisational Train Wreck

At the recent JAFAC conference, Ben Gracewood told the story of how POS developer Vend transformed their development organisation following catastrophic disruption and losses. He explored what happened after they reduced headcount by over 30%, what they had in place that enabled them to survive, and what they did differently as a result of the changes.

Shane Hastie
on Jul 16, 2018
DevOps

From Darwin to DevOps: John Willis and Gene Kim Talk about Life after The Phoenix Project

IT Revolution recently published an audiobook with nearly eight hours of conversation between Gene Kim and John Willis; Beyond the Phoenix Project – the Origins and Evolution of DevOps.

Helen Beal
on May 23, 2018
DevOps

What Resiliency Means at Sportradar

Pablo Jensen, CTO at Sportradar, talked about practices and procedures in place at Sportradar to ensure their systems meet expected resiliency levels, at this year's QCon London conference. Jensen mentioned how reliability is influenced not only by technical concerns but also organizational structure and governance, client support, and requires on-going effort to continuously improve.

Manuel Pais
on Apr 06, 2018
DevOps

Serverless Challenges in Hybrid Environments

Sam Newman, independent consultant and author of the book "Building Microservices", talked at the Velocity conference in London on the challenges faced when hybrid systems rely on both serverless architectures and traditional infrastructure. In particular, Newman discussed how serverless changes our notion of resiliency and how the two paradigms clash at times of high load in the system.

Manuel Pais
on Nov 30, 2017
DevOps

Chaos Monkey 2.0 Runs via Spinnaker

Netflix has recently made available the source code of the Chaos Monkey 2.0. The latest iteration of the resilience tool is fully integrated with Spinnaker and event tracking systems, but the SSH support has been removed.

Abel Avram
on Oct 24, 2016
DevOps

DevOps Days Kiel Day 2

Round up of the talks at DevOps Days Kiel's second day.

Manuel Pais
on May 19, 2016
Java

Google Kick-Starts Git Ketch: A Fault-Tolerant Git Management System

Although development has only started, Google has announced their first commits of Git Ketch, a multi-master Git management system that replicates information across multiple Git servers for resilience and scalability. The changes are based on JGit, a Java-based Git server, although other Git servers may be part of the multi-master cluster.

Abraham Marín Pérez
on Feb 02, 2016
Microsoft Makes Available Their Platform for Building Microservices

Microsoft has announced and made available the preview of Azure Service Fabric (ASF), a cloud platform including a runtime and lifecycle management tools for creating, deploying, running and managing microservices. ASF microservices can be deployed on Azure or on-premises on Windows Server private or hosted clouds. Support for Linux is to come in the future.

Abel Avram
on Apr 30, 2015
Anti-patterns for Handling Failure

Oliver Hankeln shares the anti-patterns he found for handling failure in organizations: hiding mistakes, engaging in blame game, the arc of escalation and cowardice. He then suggests corrective actions for each of them.

Manuel Pais
on Apr 04, 2015

Newer News

Older News

InfoQ Software Architects' Newsletter

News