InfoQ Homepage DevOps Content on InfoQ
-
AI-Powered SRE for Autonomous Incident Response
The presenters discuss incident response, how AI-enhanced SRE platforms connect signals from logs, metrics, traces, and historical incidents to enable autonomous decisions.
-
Week-Long Outage: Lifelong Lessons
Molly Struve shares a "murder mystery" outage story from a massive Elasticsearch upgrade. She explains why you need a rollback plan, how to check biases, and why leadership support is a stabilizer.
-
Building a Future-Proof Observability Platform to Empower Engineers
Wayne Bell and Dan Gomez Blanco explain how Skyscanner transitioned from siloed telemetry to a unified OpenTelemetry standard, treating their internal platform as a product to drive adoption.
-
From VR to Flat Screens: Bridging the Input and Immersion Gap
Dany Lepage explains how Lucky VR scaled "Vegas Infinite" from Meta Quest to PS5, PC, and mobile. He shares the technical hurdles of cross-play, dual avatar systems, and the "product fit" trap.
-
Platform Engineering: Lessons from the Rise and Fall of eBay Velocity
Randy Shoup shares how eBay doubled engineering productivity but failed to pivot the business. He explains the technical wins of the Velocity Initiative and the cultural hurdles that remained.
-
Duolingo's Kubernetes Leap
Franka Passing explains Duolingo's migration from AWS ECS to EKS, discussing how they built a foundation with Argo CD and Karpenter to enable blue-green deployments for 128M+ active users.
-
Platform Engineering as a Practice of Sociotechnical Excellence
Lesley Cordero explains how platform engineering serves as a sociotechnical solution for scaling orgs. She shares strategies for joint optimization, communal learning, and distributed leadership.
-
No QA Environment? No Problem: How Classpass Enables Testing on a Single Environment in ECS
Po Linn Chia explains how ClassPass eliminated environment contention using ECS, Traefik, and OpenTelemetry baggage to enable scalable, ephemeral testing without a dedicated QA environment.
-
Thinking Like a Detective: Solving Cloud Infrastructure Mysteries
Brendan McLoughlin discusses a "cloud mystery" methodology for debugging complex systems. He explains how to use HTTP status codes and request flow diagrams to track down elusive infrastructure bugs.
-
Fix SLO Breaches before They Repeat: an SRE AI Agent for Application Workloads
Bruno Borges explains how to automate SLO breach diagnostics using SRE agents and MCP tools. He shares methodologies for identifying bottlenecks and balancing speed, cost, and reliability.
-
DevOps Is for Product Engineers, Too
Lesley Cordero explains how DevOps and platform engineering drive sociotechnical excellence. She shares strategies for joint optimization, distributed leadership, and organizational sustainability.
-
Securing AI Assistants: Strategies and Practices for Protecting Data
Andra Lezza reviews the OWASP Top 10 for LLMs and contrasts security controls for independent vs. integrated copilot architectures.