The Story of a Project
This is the story of a project, neither more complex nor simpler than others: an application that communicates with a database and two other systems. Something quite mainstream from a technical and architectural side, something standard from the management side: all work was to have been done yesterday and there is a lot to do. In short, “it’s gonna be hard!” (A sentiment oft expressed by the developers but nobody screams it out too loud.
So we build the team. 40 persons are hired and assigned specialized roles. The teams are organized in pools, and a kind of contract is set up between the different pools. Each pool is responsible for addressing certain kind of demands. A flow of demands appears. Certain pools are under pressure and become bottlenecks: a stock of demands is created upstream whereas the downstream pools are waiting. For those in pools under pressure, important things are becoming urgent things. Choices must be made among urgent things in order to treat the most immediate ones. Task-switching is becoming the way of working and, in the end, the flow slows down.
Then the ‘go live’ deadline nears: it is in two months. The user acceptance tests are just starting but have been delayed by the tedious and painful integration between the different components. Maybe the built contracts between the teams have complicated the integration: some mandatory parameters are missing, the dates do not respect the proper format, the error codes are partially interpreted…
In any case, the user acceptance tests detect more bugs than what the development team can resolve and all is still not tested.
So we add more manpower. A team to resolve the bugs, a team, in another location, to finish the development, a third one to integrate the different components. But all these teams share the same strain of code and the changes of some will impact the corrections made by others.
Other predictable results followed: the developers worked all night long and during week-ends); the deadline has been postponed, the initial scope of work tasks modified (reduced of course); and, thanks to the miracle of computer science, the application was finally delivered and running!
This was a couple of years ago. I thought this was “the way to go” and that all projects were managed the same way. Since then, I have experienced different contexts, read a lot and talked with many people. I now see this story with a different state of mind.
Building an IT system is a kind of complex mechanism that mixes technical, architectural skills but also management and human skills. There is a wealth of literature on the both fields but concerning the management and human part of building system, I have to admit Tom De Marco is certainly one of my favorite authors. I do remember two of his publications. In the first one untitled “Software Engineering: An Idea Whose Time Has Come and Gone?”, De Marco talked about the (illusion of) control of software projects. In the second one, “Slack”, De Marco explores classical issues in Management and explains how more slack can help organizations to be more adaptable and eventually be more efficient.
But let’s get back to our story with these two publications in mind and see how we could have avoided a couple of classical issues.
No priorities, or the famous “we need it all”
Tom Demarco shows a way to avoid the trap of the monolithic, “every feature is critical” project:
“You say to your team leads, for example, «I have a finish date in mind, and I’m not even going to share it with you. When I come in one day and tell you the project will end in one week, you have to be ready to package up and deliver what you’ve got as the final product. Your job is to go about the project incrementally, adding pieces to the whole in the order of their relative value, and doing integration and documentation and acceptance testing incrementally as you go.”
In short: working to always be able to go live, tomorrow
Beyond the real organizational and technical issues, you will need to be willing to incrementally build the software. Contractual agreement responsibilities between the teams would be limited so as not to organize themselves on technologies or tasks but on business features. From the technical point of view, each team will so be responsible for the good running of the complete feature. From the management point of view, the managers and business guys will have to make choices: what is THE absolutely needed feature. From my own experience, the more you work in environments where you have to meet the deadline, the more this kind of feature teams and organization will help you.
Management pressure and fear culture
Again, we look at what De Marco has to say about the culture of fear and its characteristics:
“…among the characteristics of the culture of fear organization are these:
- It is not safe to say certain things (e.g. I have serious doubts that this quota can be met).
- Goals are set so aggressively that there is virtually no chance of achieving them.
- Power is allowed to trump common sense.…”
When I am thinking of fear management, I imagine a kind of despot physically impressive who shouts at his collaborators from his desktop, striking with his fist on his desk…a beautiful cartoon in brief. It seems to be a little more insidious but we have to admit there are contexts where people are under such pressure, where it is difficult to raise an alert, where deadlines are fixed without any considerations of the teams capacity to do and where, in the end, the latter are under the pressure of commitments taken by their managers regarding their own hierarchy.
Understanding the problem is certainly the first step. But how can we solve it? What can we do when a manager does not understand the risks and refuse to accept what is unavoidable: you need to choose, prioritize and negotiate the scope to keep the deadline or move it forwards?
This task is far from being an easy one and the best answer I have now is the “backlog” coupled with a “burn-down chart”. There are, in my opinion, several benefits in these kinds of situations:
- Bring together all the tasks (technical, functional tasks…). These tasks can, of course, be organized or consolidated by use-cases or features.
- Share all the tasks with all the project participants. To say it differently, rendering the immensity of what must be done.
- Show a confident and realistic deadline and thus, enabling the managers to prioritize efficiently between the tasks.
- Show all the added tasks that will necessarily postpone the initial deadline.
A few weeks before the “go live” and while the current organization is under pressure, a decision is made to add more manpower
The Brooks’ law was established in 1975 (I was still not born) and states:
“Adding manpower to a late software project makes it later”
We have all experienced it. But we have to admit we still all tend to add more manpower to meet the deadline instead of changing the initially defined scope and keep an optimal and adapted team size. Brooks explains his law with two major points. The first one concerns new team members who have to be trained thus consuming productive time of people already in place. The second one is a myth making us believe that development tasks can be segmented “as you go”, not taking into account the intellectual part of the work and the inter-personal communication needed between all developers. We can moreover add difficulties linked to the organization of the developments and the needs to share between all developers the same code. So many details that will make the teams’ productivity decrease.
Tom de Marco offers us his own take on Brooks’ law, what he calls “overstaffing”:
“Meeting the deadline is not what this is all about. What this is about is looking like you are trying your damnedest to meet the deadline. In this age of “lean and mean” it is positively unsafe for you to run the project with a lean (optimal) staff.”
Please, read it again, more slowly. Don’t you think this is an interesting point of view…? What De Marco suggests, or maybe, what I want to understand, is that we do not add manpower because we think it will go faster or better. We do add manpower as an excuse in case of failure. Just to say, as any children would say: “it is not my fault, I did my best”.
What a kind of natural response! Changing this behavior would first need to modify our educational path (and learn to learn from our failures and mistakes) and to develop organizations where failure is an option (not too often of course)…
“Everything fails all the time”. Why don’t you play with it instead of ignoring it?
So as usual, criticism is easy and art is difficult. We will thus notice that the same “errors” occur again and again whereas alternatives (which, be sure about that, will have other limitations) exist but rarely tried. “Risk Management is a discipline of planning for failure” (Slack, Tom de Marco) and this is maybe where we are not good at. “Everything fails all the time” states Werner Vogels.
What both of them are telling: failure is hard to avoid. That’s part of the deal as a rock-climber must accept falling if he wants to climb well and go further. So embrace it, deal with it and learn from it.
Tom De Marco teaches us that managing risks will first demand to identify them, to monitor them, to set indicators that alert us when the failure is upcoming. Sometimes, alternatives will have to be found. Some people will have to be trained. Parallel version of software will be developed in order to choose, at the very last moment for decision, the most adapted solution (Lean Management calls this “set based design” principle).
But risk management is not only about planning. From the architectural point of view, risk management will of course imply reliability management: fail-over…But this approach of risk management will imply to architect our systems – in strong collaboration with people from the business – to manage and embrace all these errors. In other words, to forecast in our architecture the maximum of all likely error cases:
- How to manage a degraded mode in case a subpart of the whole system turns unavailable or if there are more visitors than expected?
- What are the procedures (if needed manual procedures) to proceed to finalize a business process in case of error?
- What are the mandatory information needed to finish the business process?
- What are the alarming mechanisms in order to be pro-active regarding the end-user, informing him an error occurs and helping him properly finalize his in progress work?
On an existing system, evolution (and not revolution) will have to be done in order to make sure these already detected error cases are definitively fixed.
But let’s be honest. If you are cost-driven, you will find all the non-business requirements useless. But finally, our faith in an application isn’t it more based on its ability to manage the error (resiliency and reliability) than any other criteria?
To conclude: So what?
Here is another story, another project, neither more complex nor simpler than others. And as in all projects, there is a lot to do. At the beginning, the IT and the marketing teams defined a clear roadmap.
- “We would like to launch our new platform in 4 months with those minimum marketable features. Our expectations in terms of visit if about 3000 visits per hour and 80% of the visitors will use the new feature” told marketing team.
- “ok” told the CTO. “we will need to add a couple of non functional features to limit the access to the platform in case our forecast are wrong. There is also a technical risk on this other feature we would like to check asap. Maybe we will have to challenge the content of the next release”.
- “of course. Tell us asap about these risk. For the month after we will need these new features…”
- - …
The platform grew up incrementally with feature teams responsible to do the whole user stories, to fix their bugs and to deploy their code.
Of course, IT encountered technical issues. Business and IT had so to work together to refine the roadmap. Deadlines have been met but of course, some features have been postponed. That was the lowest priority ones.
Of course, the marketing forecasts were not that good. The non functional features the CTO added to the platform (for instance to log all users’ behavior or to limit the access when the known platform limits were reached) have been very helpful during the platform launching and helped the marketing team to refine its forecasts. The users were loyal and trusted the nascent platform.
In the open space, walls were full of information. The burndown chart and cumulative flow diagrams that showed the team velocity or troubles, the objectives of the next release…
During this project, one technical fellow has been hired: an opportunity the CTO did not want to miss.
Anyway, this project worked well and is still in production. The technical staff was proud of their job…
Brandon Holt, Preston Briggs, Luis Ceze, Mark Oskin May 21, 2015
Kai Kreuzer, Olaf Weinmann May 21, 2015