InfoQ Homepage DevOps Content on InfoQ
-
Paving the Road to Production
Graham Jenson shares his experience of creating "paved roads" and deploying pipelines at Coinbase for the past five years, and what the advantages of doing that are.
-
Greenwater, Washington: an Availability Story
Marc Brooker discusses defining and designing for availability that takes people into account, including examples of massive-scale cloud systems designed using these principles.
-
Failing Fast: the Impact of Bias When Speeding up Application Security
Laura Bell explores how bias impacts the security of a development lifecycle and examines 3 common biases that lead to big issues in this space.
-
Cloud Native Is about Culture, Not Containers
Holly Cummins shares stories of customers struggling to get cloud native and all the ways things can go wrong.
-
Leading Technical Projects - and How to Get Them Done
Sarah Wells shares stories on how the Operations and Reliability team at the FT built tools that are used by lots of their development teams: the challenges they faced, the things they tried and more.
-
Production & Debugging in a Serverless World
Tal Weiss covers some of the main things to watch out for and the advanced techniques we can put in place to make sure that we'll be prepared to debug even the nastiest Serverless production issues.
-
Scaling Culture of Resiliency in the Enterprise
Nate Vogel shares how he grew the data engineering team with an emphasis on building a culture of reliability, discussing processes and tools used.
-
IBM’s Principles of Chaos Engineering
Haytham Elkhoja discusses the process of getting engineers from across to agree on a list of Chaos Engineering principles, adapting existing principles to customer requirements and internal services.
-
Armor CLAD Functions
Guy Podjarny talks about how to properly secure our cloud functions. He uses a model called CLAD to remember what's left to protect, and discusses concrete practices to scale our defences.
-
Top Five Things You Can Do to Reduce Operational Load
Rachel Obstler discusses the things one can do to make a big difference in reducing operational work from incidents, reducing duplicate efforts, surfacing issues, and improving response times.
-
Self-Service Chaos Engineering: Fitting Gremlin into a DevOps Culture
Doug Campbell shares how they rolled out Gremlin at Grubhub and how they educated and enabled all engineering teams to use it.
-
Continuous Resilience
Adrian Cockcroft talks about how to build robust systems by being more systematic about hazard analysis, and including the operator experience in the hazard model.