Bol.com's DevOps Journey
On the first day of DevOpsDays Amsterdam 2014, bol.com, an online store, reported its experiences in its DevOps journey. Full automation, careful team building and an agile mindset that cross-cuts the organisation were the keys to success.
Jos Houtman and Niels van de Wall, engineers at bol.com, explained that all the company has an agile mindset. The development department manages 50+ applications, supported by 150 engineers which are organised into 30 Scrum teams. The business has agile management practices: daily stand-ups, scrum boards and roadmap planning with product backlogs are common. Even so, the last two years were years of profound transformation of their web operations. At the start of that transformation, they defined a set of principles to manage their web operations:
- A single source of truth for their infrastructure
- Technical solutions mustn't compromise high availability
- Defined, conditional boundaries - if a change crosses those boundaries, it does not reach production
- Measure and monitor everything
- No manual actions to setup environments
- Manage all the environments (i.e., development, staging, production) the same way
- All the changes are peer reviewed
To enact the initiative, bol.com built a team around two ideas: the right attitude towards the DevOps mindset and CAMS and a desire to automate and make structural improvements guided by measurements.
From a technical perspective, bol.com is now able to build a complete environment from scratch in two hours. It uses Puppet for configuration management, following Craig Dunn's Roles/Profiles Pattern. It uses Rundeck for workflow automation. On the aspects to improve, bol.com cautions against Puppet's dependency hell and the slowdown when the amount of resources gets large.
Hiera, a key/value lookup tool for configuration data by Puppet, is used as the single source of truth. Jos and Niels report that it wasn't the best choice, since Hiera isn't a good data source for complex information, which led to the creation of custom solutions on top of it.
Nagios monitors bol.com's infrastructure. The monitoring configuration is based as much as possible on standard checks provided by the tool and its plugins. All the monitoring, logging and metrics are integral to the Puppet's profiles, as per the Roles/Profiles pattern.
On the team dynamics, bol.com found good parts, that helped them succeed. They were able to find the right people, through careful observation of would-be team members behaviour and demanding technical expertise. The teams are responsible for building and running their applications, valuing ownership and focus.
Time pressure combined with building a new team, a new platform and new ways of working, also induced bad and ugly parts. Despite having found the right people in a short time, Jos and Niels feel they were lucky. The team joiners were put in a "pressure cooker", with an agressive ramp-up period. Maintaining the team happy and motivated was a delicate balancing act that could have gone wrong.
Anatole Tresch Mar 03, 2015