Moving Ops from black to white box
During his talk at DevOps Days in Gothenburg Mitchell Hashimoto, co-author of Vagrant and system admin at Kiip, proposed an experience-based roadmap for moving organizations from a traditional black box ops culture to an (ideal) white box culture where developers are free to change the production environment.
Mitchell’s roadmap aims at keeping applications (and environments) stable yet promoting faster delivery cycles in a quick feedback loop. The proposed roadmap is composed of 5 steps:
· Metrics and monitoring
· High-level documentation
· Mirroring production environments in dev
· DevOps office hours
· Automated infrastructure tests
Measuring the operational environment provides insight to dev on operational performance and stability. Although many monitoring tools are available, these are often foreign to developers. By extracting data and providing visual feedback such as graphics that depict server load or response time evolution developers become aware of what is actually going on in a running system.
Documenting the infrastructure with high-level runtime architecture diagrams or other meaningful artifacts (e.g. deployment process, failure resolution, tool guides, etc) provides insight on the internals of the production environment and on impact of changes on system-wide qualities such as scalability or performance. Regular short tech talks also help improve visibility on the running side of application delivery and can provide more in-depth explanation of specific technologies or tools.
Mirroring production in dev environments allows developers to get familiar with production scripts and start experimenting without fear of failure. Effort is saved by reusing scripts and tools to manage development and staging environments in the same way as production. Furthermore, the deployment process gets exercised/tested dozens or hundreds of times before it’s actually used for production.
Further cultural changes promoting a DevOps culture include both dev and ops weekly office hours to explain and clarify all kinds of topics on both sides or even performing some code reviews thus fostering a collaborative learning environment. A final technical change involves the automation of infrastructure tests (either at unit, integration or system level) which provides a safety net for developers to contribute changes to ops. At this point developer changes to ops are controlled and easy to verify by ops.
Mitchell stresses the fact all these changes need to be brought in slowly and sequentially so they can be digested. In particular alternating a technical change with a cultural change gives breathing space for changes to settle in.
Tom Gilb & Kai Gilb Jan 26, 2015