InfoQ Homepage DevOps Content on InfoQ
-
An Open Source Infrastructure for PyTorch
Mark Saroufim discusses tools and techniques to deploy PyTorch in production.
-
How Did It Make Sense at the Time? Understanding Incidents as They Occurred, Not as They are Remembered
Jacob Scott explores the basics of failure in complex systems, the theory and practice of how it made sense at the time, and actions to take.
-
Effective and Efficient Observability with OpenTelemetry
Daniel Gomez Blanco shares his experience leading a large-scale observability initiative at Skyscanner, based on the adoption of OpenTelemetry across hundreds of services.
-
Taming Configuration Complexity Made Fun with CUE
Marcel van Lohuizen discusses configuration at scale including the design of CUE, how configuration can go wrong, the need for testing and validation, and how CUE does holistic configuration.
-
Rethinking Reliability: What You Can (and Can't) Learn from Incidents
Courtney Nash discusses research collected from the VOID, challenging standard industry practices for incident response and analysis, like tracking MMTR and using RCA methodology.
-
Sprinkling eBPF onto Your Observability
Frederic Branczyk discusses the eBPF's capabilities. Beyond that, Branczyk will demonstrate the real-world use of eBPF in next-generation Observability tooling.
-
Cloud Provider Sustainability, Current Status and Future Directions
Adrian Cockcroft explains what is available now in terms of green energy, and public roadmap statements and commitments that have been made by AWS, Azure and GCP.
-
Infrastructure as Code: Past, Present, Future
Joe Duffy discusses the challenges (and solutions) met while running IaC and how that shapes the future of IaC.
-
The Endgame of SRE
Amy Tobey discusses sociotechnical thinking, exploring ways SREs can impact reliability at scale.
-
Azure Cosmos DB: Low Latency and High Availability at Planet Scale
Mei-Chin Tsai and Vinod Sridharan discuss the internal architecture of Azure Cosmos DB and how it achieves high availability, low latency, and scalability.
-
Tesla's Virtual Power Plant
The speakers explore the architecture of the Tesla Energy Platform including the use of asset hierarchies, functional programming techniques, trade-offs in edge vs. cloud computing.
-
Beyond Default Settings: Evaluating the Security of Kubernetes and Cloud Native Environments
The panelists discuss default configurations, authentication, and access control mechanisms in the context of what Kubernetes brings to the table in terms of security.