BT
DevOps Follow 1012 Followers

Gremlin Releases Application Level Fault Injection (ALFI) Platform for Targeted Chaos Experiments

by Daniel Bryant Follow 800 Followers on  Oct 07, 2018 2

Gremlin Inc has released their second product offering in the “Failure-as-a-Service” domain– Application-Level Fault Injection (ALFI). Building upon their initial platform that facilitated engineers in creating and running chaos experiments at the infrastructure level, ALFI enables failure injection at the application level via a native language library.

Architecture & Design Follow 2529 Followers

How to Achieve a Resilient Architecture

by Jan Stenberg Follow 38 Followers on  Sep 13, 2018

To manage systems at scale you must push your system almost to the breaking point, but still be able to recover – and embrace failures, Adrian Hornsby writes in two blog posts sharing his experiences from working with large-scale systems for more than a decade, and the patterns he has found useful.

DevOps Follow 1012 Followers

Chaos Engineering at LinkedIn: The “LinkedOut” Failure Injection Testing Framework

by Daniel Bryant Follow 800 Followers on  Jun 24, 2018

The LinkedIn Engineering team has recently discussed their “LinkedOut” failure injection testing framework. Hypotheses about service resilience can be formulated and failure triggers injected via the LinkedIn LiX A/B testing framework or via data in a cookie that is passed through the call stack using the Invocation Context (IC) framework. Failure scenarios include errors, delays and timeouts.

AI, ML & Data Engineering Follow 1062 Followers

Microservices Resiliency and Fault Tolerance Using Istio and Kubernetes

by Srini Penchikala Follow 40 Followers on  Jan 15, 2018 4

Animesh Singh and Tommy Li from IBM spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about the microservices resiliency and fault tolerance leveraging Istio framework. They also showed how to configure and use circuit breakers and other resiliency features using Istio.

DevOps Follow 1012 Followers

Chaos Engineering at Twilio

by Hrishikesh Barua Follow 16 Followers on  Dec 25, 2017

The Twilio team describes their foray into Chaos Engineering where they use Gremlin to inject failures into their homegrown queuing system shards to test for automated recovery.

Java Follow 1156 Followers

What's New in MicroProfile 1.2

by Michael Redlich Follow 16 Followers on  Nov 30, 2017

The Eclipse Foundation recently released MicroProfile version 1.2. New APIs added to this release include improved communications among microservices, response to system faults, and the JSON Web Toolkit (JWT). Emily Jiang, CDI and MicroProfile development lead at IBM, and Michael Croft, Java middleware consultant at Payara, spoke to InfoQ about this latest release.

DevOps Follow 1012 Followers

Expedia's Journey toward Site Resiliency: Embracing Chaos Testing in Dev and Production at QCon SF

by Daniel Bryant Follow 800 Followers on  Nov 19, 2017

At QCon SF, Sahar Samiei and Willie Wheeler presented “Expedia’s Journey Toward Site Resiliency”, and discussed the building of a community of practice around resilience testing within Expedia. The results have generally been positive: Netflix’s Chaos Monkey has been running daily in production since May 15th; and resilience tests have been added to four Tier 1 service pipelines.

Architecture & Design Follow 2529 Followers

Relearning Functional Service Design for Microservices: Uwe Friedrichsen at microXchg

by Daniel Bryant Follow 800 Followers on  Feb 19, 2017

The opening talk of the microXchg microservices conference was delivered by Uwe Friedrichsen, and discussed “Resilient Functional Service Design”. Key takeaways included: microservice developers should learn about fault tolerant design patterns and caching; understanding Domain-Driven Design (DDD) and modularity is vital; and aim for replaceability of components rather than reuse.

Java Follow 1156 Followers

Google Kick-Starts Git Ketch: A Fault-Tolerant Git Management System

by Abraham Marín Pérez Follow 9 Followers on  Feb 02, 2016

Although development has only started, Google has announced their first commits of Git Ketch, a multi-master Git management system that replicates information across multiple Git servers for resilience and scalability. The changes are based on JGit, a Java-based Git server, although other Git servers may be part of the multi-master cluster.

Followers

FoundationDB SQL Layer: Storing SQL Data in a NoSQL Database

by Abel Avram Follow 12 Followers on  Sep 10, 2014 4

FoundationDB has announced the general availability of SQL Layer, and ANSI SQL engine that runs on top of their key-value store. The result is a relational database backed up by a scalable, fault-tolerant, shared-nothing, distributed NoSQL store with support for multi-key ACID transactions.

Followers

Refreshed AWS Trusted Advisor Offers Several Free Checks

by Steffen Opel Follow 4 Followers on  Aug 31, 2014

Amazon Web Services (AWS) has recently integrated the AWS Trusted Advisor into the AWS Management Console and made four security and service limit checks available at no charge. Additional checks from the security, performance, fault tolerance and cost optimization categories remain part of their Business and Enterprise support tiers.

Followers

The Netflix API Optimization Story

by Jeevak Kasarkod Follow 4 Followers on  Feb 08, 2013 5

The Netflix API optimization story is an interesting journey from a generic one-size-fits-all static REST API architecture to a more dynamic architecture that lends power to the client team to define and deploy their custom service endpoints. InfoQ spoke to Ben Christensen regarding this client adapter layer as well as the services layer redesign.

Followers

10gen: MongoDB’s Fault Tolerance Is Not Broken

by Abel Avram Follow 12 Followers on  Feb 07, 2013 10

A Cornell University professor claims MongoDB’s fault tolerance system is “broken by design”. 10gen responds through its Technical Director, rejecting the claims.

Followers

Netflix Hystrix - Latency and Fault Tolerance for Complex Distributed Systems

by Bienvenido David Follow 1 Followers on  Dec 21, 2012

Netflix has released Hystrix, a library designed to control points of access to remote systems, services and 3rd party libraries, providing greater tolerance of latency and failure. Hystrix features thread and semaphore isolation with fallbacks and circuit breakers, request caching and request collapsing, and monitoring and configuration.

Followers

Introducing Windows New File System: ReFS

by Jonathan Allen Follow 635 Followers on  Jan 17, 2012

For the first time since 1993 Microsoft is posed to offer a new file system architecture. ReFS or Resilient File System is designed to both improve reliability and as a chance to drop obsolete features offered by NTFS.

BT