BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Register Sign in

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Log In

or

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Practical Robustness: Going beyond Memory Safety in Rust

Andy Brinkmeyer shares how engineering leaders and architects can use Rust to build failure-proof systems. Moving beyond memory safety, he explains how ownership, enums, and the typestate pattern embed complex runtime protocols into compile-time checks. Learn to eliminate entire classes of bugs, manage real-world resources safely, and maximize codebase robustness effortlessly.

All in development

Architecture & Design

Featured in Architecture & Design

Governance in the Age of AI: A Conversation with Sarah Wells

In this podcast, Michael Stiefel spoke to Sarah Wells about the relationship of governance to software architecture. Governance enables teams to work effectively by establishing procedures that minimize system complexity, improve security, and reduce repetitive tasks. Targeted checklists help engineers by reducing the stress over these procedures.

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Accelerating Netflix Data: a Cross-Team Journey from Offline to Online

Raj Ummadisetty and Ken Kurzweil share Netflix's architectural pivot to CloudStream, a repeatable capture, conversion, and deployment framework. They discuss shifting key-value abstractions from stateless to stateful to move terabytes of bulk data safely. Software architects will learn to exploit data access patterns, use "Pathfinder" prototypes, and maintain a 99% faster rollout.

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

Road to Compliance: Will Your Internal Users Hate Your Platform Team?

Davide de Paolis discusses the realities of rolling out cloud infrastructure compliance without fracturing developer relations. Drawing from a real-world platform team reboot at Sevdesk, he explains how to implement "minimum viable governance" on AWS, utilize event-driven Slack alerting to automate policy feedback, and shift from rigid enforcement to high-empathy, data-driven collaboration.

All in culture-methods

DevOps

Featured in DevOps

Chaos Engineering GPU Clusters

Bryan Oliver discusses the frontier of AI infrastructure: chaos engineering for large-scale GPU clusters. He shares how engineering leaders can handle complex topologies, network protocols like RDMA, and NUMA misalignments. Discover seven practical fault-injection strategies to maximize multi-million dollar hardware efficiency and build robust observability loops.

All in devops

Events

Helpful links

Choose your language

AI Security & Privacy Engineering Certification

Secure and govern production AI systems, from sensitive data to guardrails, evals, and audits.
Online. Register now.

AI Engineering Certification

Production AI calls on retrieval, agents, evals, and infrastructure, checked with peers.
Online. Register Now.

Architect Certification

Distributed systems, decentralized decisions, platform engineering, and AI architecture.
Online. Register Now.

QCon San Francisco

What's working across AI, architecture, and leadership, from the teams doing it.
Register. Early bird ends July 14.

QCon London

What early-adopter teams have proven in production, across 15 engineering tracks.
Register. Early bird ends July 14.

InfoQ Homepage Interviews Kolton Andrus on Breaking Things at Netflix

Kolton Andrus on Breaking Things at Netflix

Download

16:40

Bio

Kolton(@KoltonAndrus) is a Chaos Engineer on Netflix’s Edge Platform team. He designed and built FIT, a failure injection service. Prior to Netflix, he worked in Amazon Retail where he built Gremlin, Amazon’s failure service. In both companies he has served as a ‘Call Leader’, managing the resolution of large scale incidents.

About the conference

Software is Changing the World. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Sep 18, 2015

BT