InfoQ Homepage Reliability Content on InfoQ

Presentations

RSS Feed

Newer Older

Culture & Methods

Shifting Left for Better Engineering Efficiency

Ying Dai discusses how reliability and productivity drove two critical migrations at Roblox, improving telemetry and automating change rollouts to boost engineering efficiency.

Ying Dai
on Jun 24, 2025

Icon

47:19
Architecture & Design

How We Created a High-Scale Notification System at Duolingo

Vitor Pellegrino and Zhen Zhou discuss how they built and tested Duolingo's high-scale on-demand notification system, including what it takes to manage resources and site reliability concurrently.

Vitor Pellegrino Zhen Zhou
on Sep 24, 2024

Icon

49:01
Architecture & Design

How Netflix Ensures Highly-Reliable Online Stateful Systems

Joseph Lynch discusses the architecture of Netflix's stateful caches and databases, including how they capacity plan, bulkhead, and deploy software to their global, full-active, data topology.

Joseph Lynch
on Feb 12, 2024

Icon

49:31
Architecture & Design

Reliable Architectures through Observability

Kent Quirk shows an overview of observability tools and techniques, and specific recommendations for how to fit observability into their system designs and day-to-day development process.

Kent Quirk
on Jan 11, 2024

Icon

48:59
Architecture & Design

How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency

Lily Mara shares how OneSignal improved the performance and maintainability of its highest-throughput HTTP endpoints (backed by a Kafka consumer in Rust) by making it an asynchronous system.

Lily Mara
on Jan 02, 2024

Icon

49:33
Architecture & Design

Architecting a Production Development Environment for Reliability

At Meta, developers use servers (devservers) to perform their daily work. This talk discusses their software architecture and the mechanisms employed to ensure they remain reliable and available.

Henrique Andrade
on Dec 19, 2023

Icon

56:52
DevOps

Building Reliability One Step at a Time

Ana Margarita Medina shares how she has been using Chaos Engineering and how it can be used to decouple our system’s weak points, learn from incidents and improve monitoring and observability.

Ana Margarita Medina
on Aug 29, 2021

Icon

39:01
Culture & Methods

Less Mess, Less Stress: the Reliability Benefits of Custom Tools

Daniel Hochman discusses how an overreliance on vendor tooling leads to worse reliability outcomes, how Lyft lowered MTTR for its most common alerts using custom tooling, and how Clutch can help.

Daniel Hochman
on Jul 27, 2021

Icon

27:07
DevOps

InfoQ Live Roundtable: Production Readiness: Building Resilient Systems

The panelists discuss observability, security, the software supply chain, CI/CD, chaos engineering, deployment techniques, canaries, blue-green deployments all in the pursuit of production resiliency.

Wesley Reisz Adam Zimman Holly Cummins Anastasiia Voitova Haley Tucker Charity Majors
on Dec 03, 2020

Icon

46:00
Architecture & Design

Chaos Engineering: the Path to Reliability

Kolton Andrus shares examples of what works, what doesn’t, and what the future holds in using Chaos Engineering to build reliability in a system.

Kolton Andrus
on Nov 26, 2020

Icon

26:42
DevOps

Reliability Matters More Than Ever

Tammy Butow discusses why reliability and resilience matter now more than ever, and how one can achieve them.

Tammy Butow
on May 22, 2020

Icon

25:46
Architecture & Design

High Performance Cooperative Distributed Systems in Adtech

Stan Rosenberg explores a set of core building blocks exhibited by Adtech platforms and applies them towards building a fraud detection platform.

Stan Rosenberg
on Oct 23, 2019

Icon

52:09

Newer Presentations

Older Presentations

InfoQ Software Architects' Newsletter

Presentations