Use of Kanban in the Operations Team at Spotify
InfoQ spoke with Mattias Jansson, Operations Engineer at Spotify. Spotify is a music streaming service for your desktop and smartphones, which aims to provide a wide-ranging music collection that is served up with virtually no lag. The company behind Spotify uses Kanban in their Operations team and Mattias Jansson introduced Kanban to the ops team. Kanban was featured at InfoQ several times already, though primarly from the software engineering point of view. Jansson spoke with InfoQ in detail about the experiences that the Operations team at Spotify has gained while implementing a Kanban-based approach to dealing with their workload.
Q: What initial issues were encountered when moving to a Kanban-based approach?
Our Operations team here at Spotify consists of seven members, and we deal with a range of different kinds of tasks. I believe this is rather usual in operations teams. One moment you're designing a new site which will cost heaps of money, and the next you're modifying a firewall rule for a developer.
This posed a problem for us when we started to test Kanban- we would have cards representing multiple-month projects next to tiny 15-minute jobs. And giving these stories sizes was difficult to say the least, since estimation is often difficult- will updating a lib on a server affect other seemingly unrelated areas? Writing more than a title on a story was a painful process, given that some stories were so small that writing the story could take longer than fixing the problem in itself.
But I'm getting ahead of myself.
Long before we started testing Kanban, we noticed that although we were really good at what we did, we couldn't plan too far in advance- we were reactive and not proactive. Heaps of 'urgent' jobs from other departments always got in the way of our internal projects, we did context-switching way too often caused by all the interruptions from people and systems, etc and so on. We realized that the company was growing faster than we could accommodate.
So funny as it sounds, our team's main problem was scalability- not in the sense of our streaming service (though that's another very interesting story), but rather our team and our working methods did not scale well with the growth and needs of the company.
Q: So how did you deal with this problem?
We decided to have a few meetings away from the office, where we sat and talked through what we actually did. This to get a (hopefully) clearer view on where we stood, and where we wanted to be. Here are some questions we asked ourselves:
- What kind of work do we actually do?
- How much time is spent on the various kinds of work?
- Is it possible to categorize our work into domains in a meaningful way?
- Where do the jobs come from? do we initiate them? others? If so, who?
- How are we sharing knowledge with each other?
- How can we ensure that operations development gets the time it needs?
- Is it possible to lower the amount of context-switching we do?
- and more...
After our meetings (both planned and ad-hoc), we came to the conclusion that we could get more done by introducing a goalie who would catch all ad-hoc requests and categorize them as appropriate - small tasks would be done immediately and larger tasks would be written out as a proper card. Some advantages of having this goalie were:
- Higher and snappier service level to other departments
- Ensure spreading of knowledge in our team
- Minimizing context-switching for the rest of us
We decided to keep it as simple as possible, at least to begin with. Thus:
- We have three vertical lanes: todo, doing and done.
- In addition to the above, we have two horizontal lanes for the two main types of service: standard type and intangible stories. (Most of our proactive work falls under this latter category.) By creating a separate lane for this, we can ensure that this work is actually done.
- We use only three sizes (aka T-shirt sizes): Small, medium and large. Small jobs are tasks which will probably take a day to do. Medium jobs are jobs which by gut-feeling will take half a week. Large jobs are jobs which will probably take something in the order of a week to complete. We skipped 'real' estimation since it is mostly just a waste of resources.
- We set a low WIP on the Todo lane, split up between the standard type horizontal lane and the intangible lane. We only pull in new stories into the Todo lane once a week (or if it is empty) to ensure that the internal, intangible tasks are actually done.
- We set a Doing WIP equal to the number of ops team members minus one (for the goalie). We are considering lowering it even more to encourage shared stories, but only after we get more comfortable with Kanban.
What about larger tasks? Well, we decided to call stuff which takes more than a week a 'project'. Since they are big by definition, we detail it in a wiki page, and divide it into stories of sizes small, medium and large. Then they go in to the backlog, often under the task-type "intangible".
Later on, we realized that we needed support for expedite issues, so we added a third horizontal lane in the todo and doing columns for this. A WIP of 1 was set for this lane, since we want to discourage the use of this category.
A screenshot of the Kanban board used by the Spotify Operations team.
Q: What changes had to be made from the software development-focused idea of Kanban to the Operations-focused idea of Kanban?
Not much, since Kanban is a pretty simple system. However, as far as I can tell, Kanban needs to be adapted to every organization regardless of type, and our adaptations might differ a bit from a classic dev environment.
In a well-functioning development team, you would normally have one story source- perhaps the product development department, which together with dev leads or the individual devs define each story. An intended consequence of this is that the stories are in some way isomorphic- they have comparable sizes, the story texts all fulfill some common criteria etc.
A typical operations team has many sources of tasks- from individual developers, product dev, other departments, as well as internally spawned stories. This a problem since these external tasks interrupted our long-term projects all the time- for each of us. That the more senior engineers in our team got interrupted more often just exacerbated the situation.
To make it even worse, we wanted to get these interruptions out of the way as quickly as possible, so we often forgot to document these changes, or to inform our colleagues of them.
Introducing Kanban without dealing with these oft-time small external tasks wouldn't remove this particular problem, so an adaptation was necessary.
Our adaptation was, as I mentioned earlier, putting aside one person every week who acted as goalie for the team. With the goalie in place, we could finally use Kanban properly, since most smallish external interruptions were gone from our day-to-day work.
Another difference between the dev-teams and ops-teams are, I think, that operations teams spawn quite a few internally generated tasks and projects. Our problem with this was that we found that we prioritized tasks with external sources over these internal ones. This was probably due to a combination of human nature (helping others come first) and that the external issues were often more concretely defined.
Like I touched upon earlier, we adapted our Kanban board by introducing the 'intangible' type of story, as recommended by David Anderson. With a WIP of 8 split evenly between the two parts of the Todo lane, we could ensure that 50% of our work was actually intangible ones. This has worked nicely for us so far.
Q: How did you stumble upon Kanban?
A few years ago, prior to my employment at Spotify, I had heard about Agile and Lean, and really wanted to test it in my group.
However, since we were a support & operations team, we couldn't get XP and scrum to work that well.
Then, around two years ago, I was introduced to Kanban by Henrik Kniberg. We tested around a bit with Kanban, but never really took off. Retroactively, I believe that it was because we didn't spend the time necessary to analyze our work, and that we didn't put our hearts into it.
Anyway, once at Spotify, I discussed Kanban with my colleagues and our Operations director, and we decided to give it a shot.
Q: Why did you choose Kanban for managing operations at Spotify?
We chose Kanban because it was flexible and had a relatively small implementation cost. That is, with small changes to our way of working we could get a significant boost in effectiveness.
Q: Did you try any other agile methods than Kanban, like Scrum, XP,...?
With regards to Scrum: No, since our work flow does not easily fit in the time slots of scrum.
We never tested XP, but I think we have cherry-picked some parts of it- like pair-programming. When we do scary sensitive changes to infrastructure, we normally work in pairs.
Q: When starting to work with Kanban, did you receive any training?
Yes. In addition to reading Henrik Kniberg's book (see below), some of us have attended a two day course held by David Anderson (aka Father of Kanban). It was quite enlightening, and if others have this opportunity they should take it.
We also attended an evening seminar held by Mary Poppendieck. She talked about lean software development, and though parts of the topic did not immediately apply to operations teams, it was very inspiring!
Q: Which tools are you using for managing your Kanban stuff and why did you choose them?
We tried AgileZen, which was an excellent and very pretty tool, but found the lack of support for horizontal lanes limiting.
As soon as we discovered LeanKit Kanban we switched from AgileZen, since not only can you have horizontal lanes, but you have much more freedom in designing your Kanban board there. We also make use of our wiki for describing projects.
Q: How long did it take you to get this Kanban process up and going?
It took roughly one month to analyze our situation, find task sources, bottlenecks, etc, and train ourselves to use Kanban as it was implemented in AgileZen.
However, since the Kanban process is evolutionary in nature, once up, the process gives us new insights regularly, which we try to incorporate into our work-flow. In this sense, we are nowhere near a 'complete' solution which will not need fiddling with again.
It will be interesting to see where Kanban leads us in the near future- question is how I will answer these questions a year from now.
Q: How happy are you with Kanban and do you think it's worth a look for people in operations?
Yes! We're quite happy with it. It enables us to become more agile without really having to change very much of our day-to-day work.
We've noticed that our lead times are shorter, we get more internal tasks done, and the departments we interface with are happier.
Q: Which resources can you recommend for people interested in Kanban?
- http://en.wikipedia.org/wiki/Lean_software_development- Everyone should read this, at least the Basic Principles of Lean
- "Kanban, Successful evolutionary change for your technology business" by David Anderson
- "Kanban and scrum, making the best of both" by Henrik Kniberg and Mattias Skarin (free download version available at InfoQ)
- LimitedWIPSociety.org - a gathering of like-minded Kanban evangelists write stuff there
- AgileZen.com - a web based Kanban board
- LeanKitKanban.com - another web based Kanban board
Delivering Performance Under Schedule and Resource Pressure: Lessons Learned at Google and Microsoft
Ivan Filho Mar 06, 2014