Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Applying Software Delivery Metrics and Analytics to Recover a Problem Project

Applying Software Delivery Metrics and Analytics to Recover a Problem Project

Key Takeaways

  • Problem software delivery projects can be recovered mid-flight if Value Stream Management (VSM) analytics are used in a forensic way to uncover the root-cause of the issues
  • Values Stream Management (VSM) analytics platforms adopt a root-cause framework, to surface metrics in a forensic way that consider all possible causes of the problem (separating causes within the control of the delivery team and those in the control of external stakeholders).
  • The seven root-cause metrics areas in control of the delivery team are: People Availability and Focus; Team Makeup and Stress; Backlog Health and Estimation Accuracy; Dependability and Sprint Accuracy; Delivery Process Efficiency; Story Management and Complexity; and Defect Generation and Rework
  • The two root-cause metric areas in control of stakeholders are: Changing Scope and Requirements; and Delayed Feedback and Project Input
  • The approach provides a quantitative root-cause RAG Report which enables mitigations to be put in place in flight, which are based on a detailed understanding of the underlying causes of the project’s problems. This greatly increases the chance of project recovery.

Despite the increasing prevalence of Agile software delivery methodologies and related Agile DevOps toolsets and processes – many projects still end in 'failure'.

Indeed, according to a 2017 report from the Project Management Institute (PMI), 14% of IT projects 'fail' and a further 31 percent 'didn't meet their goals'. In addition, 43% 'exceeded their initial budgets', and 49% were 'late'.

So how can delivery teams apply software delivery metrics and analytics to help diagnose the problem in a problem project and take steps to recover in flight?

Problem project diagnosis – root cause analytics framework

The problem project that we consider here is the following typical example:

"It was sold in as relatively easy feature enhancement requiring a small team size and limited time.  It ended up ballooning in scope, people requirement and time taken - and the forecast delivery date was changed multiple times ..."

Software delivery projects like these can be recovered mid-flight if analytics are brought to bear in a forensic way in order to uncover the root-cause of the issues.

With this in mind, a root-cause framework can be very helpful to quickly analyse the project in a systematic and logical way to identify all possible problem causes.  

Our preferred root-case analytics framework is presented in Figure 1 below.

If followed, it should quite rapidly lead you to the cause/multiple causes of the project’s problems.

Figure 1: Software Delivery Project - Root Cause Analytics Framework

As the framework shows, the majority of the root-causes are likely to be within the responsibility of the delivery team itself, though there are also important stakeholder responsibilities that can be causes of serious delays.

A Value Stream Management (VSM) analytics platform can be used to surface the relevant metrics that relate to each potential problem area, to enable rapid forensic analysis of the problem project in question.

Applying the Root Cause Analytics Framework – key delivery metrics and analytics  

1. Delivery team issue identification

First we consider the seven root-cause areas relating to the delivery teams themselves, as shown in Figure 1.  

1.1 People availability and focus

One of the most common causes of project delay is a lack of people actually working on the project in hand due to unplanned work and/or lack of availability (e.g. due to holidays, rescheduling etc).  

So, key metrics to investigate in this regard are:

  • Planned versus Actual People Availability (%) - historical data can be sourced from time trackers and forecast data can be extracted from HR and scheduling software to reveal the proportion of planned FTEs actually available to work on the project in the coming weeks. This metric is best viewed in the full knowledge of the actual people in question (rather than generic 'FTEs') as clearly there will be certain skill-sets and individuals who will be critical to the project’s success at any particular point in time
  • People Variance (FTEs plan v actual). The above metric can be expressed as a variance in workdays to quantify the deficit (or surplus). The variance itself reflects project management’s ability to plan for and anticipate human resource fluctuation (through forecasting and contingency planning).
  • Unplanned Work % - if a scrum Agile methodology is being adopted, this metric is extremely useful as it tracks the percentage of story points/tickets added after the sprint has started. It therefore tracks over time the proportion of unplanned work preventing focus on the core work in hand. Anything over 10% is a red light.
  • Unplanned Work % Completion (also known as Sprint Work Added Completion %) – a further indication of the negative impact of unplanned competing priorities. Anything less than 90% completion is a red light as it indicates that not only is unplanned work being added, but it is not being completed and hence accumulating as backlog.

Figure 2 – Example Sprint Work Added Completion graphic (Source: Plandek Sprint dashboard)

1.2 Team makeup and stress

Another key area to consider in any diagnostic is the wellbeing of the teams themselves. Stressed teams are both a symptom of and a potential cause of problem projects.  As such there are several key people-oriented metrics to consider:

  • Out-of-Hours Commits is a little used but interesting metric to consider. Teams that are regularly working late and committing code out of normal office hours are likely to be under stress. If the percentage of OOO Commits increases markedly and on a sustained basis it is a red flag.
  • Commit Points versus Capacity (based on team average velocity) is another metric looking to identify if the teams are becoming over-burdened with likely negative consequences on the project outcome. This metric looks at 'how hot the teams are running'. If story points delivered markedly exceeds identified 'capacity' based on the previous 12 weeks velocity, then teams are likely to be delivering at an unsustainable pace which will result in burn-out and major related problems.  
  • WIP (at individual level) is another key measure of team stress and looks at the number of tasks 'in progress' at any one time. For optimal efficiency each team member should be working on a single task at one time. Otherwise, context switching can result in ineffective use of time. If the WIP measure rises above 2 tasks per person you should explore the source of the requests and how to enable the teams to maintain better focus.

1.3 Backlog health and estimation accuracy

An inability to manage the backlog of work and estimate accurately are among the most common causes of problems in Agile software delivery – if teams are unable (or unwilling to expend the time) to plan future workload and estimate accurately - timing and capacity planning become very unreliable and short-term goals become very difficult to meet.

Analysis of backlog health metrics often provide a good proxy for likely estimation problems.     

  • Story Points Ready for Development – is a good basic measure of backlog health. If the number of Story Points (SPs) ready for dev is anything less than two times the average sprint throughput (i.e. velocity), then this is a red flag.

    A shortage of SPs ready for development shows that teams are at risk of running out of work. And if the backlog of ready work cannot be replenished quickly enough, time and capacity will be lost.

    Figure 3 – Example Story Points Ready for Development graphic: Plandek Backlog Health Dashboard

  • Story Backlog Distribution – In the event teams are running low on work, it’s important to understand where upcoming work is in the design and estimation process. Story Backlog Distribution is another useful measure of backlog health which shows how diligently teams are analysing the backlog and preparing for coming sprints. If there is a visible decline in stories actively being designed and/or estimated, then it is future velocity will decrease and make delivery timing/resourcing less predictable.  

    Figure 4 – Example Story Backlog Distribution graphic: Plandek Backlog Health Dashboard
  • Time to Design – is an important metric to consider as it reflects the time it takes to replenish the backlog.  A nimble design process enables a quick response to shortfalls before it impacts the team.  On the other hand, a lengthy design process may be a contributing factor to teams running out of work and subsequent project delays.
  • Planned versus Actual Story Points (or time) – is an important measure of estimation accuracy.  It shows the variance in actual time taken/story points for scoped tickets in the backlog.  Variance between estimated and actual complexity is a common contributing factor in problem projects. 

1.4 Dependability and sprint accuracy

Project delivery relies on the dependability of the scrum teams delivering the software. If individual teams are unable to reliably deliver their sprint goals (typically a two-week commitment), then it is highly unlikely that longer term project goals (often involving multiple teams and sprints), can be accurately met.

Team dependability is a key building block of successful projects and hence a key focus in project recovery.

There are many potential sprint accuracy metrics, our preferred metrics are:

  • Sprint Completion (%) – the key measure of a team’s ability to hit its sprint commitments. This metric shows the proportion of tickets completed, including any stories added after the sprint started. Anything less than 80% is a red light as it shows a lack of dependability that requires further investigation. It may be due to an inability to estimate accurately, but may be due to other process issues such as unplanned work, poorly defined tickets etc.
  • Sprint Target Completion (%) – is a more specific measure looking at the proportion of planned tickets delivered (excluding any unplanned work added after the sprint started).  It is another key measure of a team’s ability to hit its sprint commitments. Anything less than 80% is again a red flag as it shows a lack of dependability that requires further investigation.  

Figure 5 – Example Sprint Completion graphics: Plandek Dependability Dashboard

1.5 Delivery process efficiency

Another area that may be a contributing factor in a project’s problems is the overall efficiency of the delivery process. There are several metrics that are important to review in this regard.

  • Delivered Story Points is a basic measure of throughput and as such an important metric to view when trying to diagnose the causes of a problem project. A marked decrease over time (or at a particular point in time) is an obvious red flag, as is a marked change for any particular team within the project.  
  • Flow Efficiency is a great measure of delivery efficiency at team level. Its analysis enables Teams to isolate and analyse 'waste' in the process and consider if there is scope to reduce or eliminate it. The analysis shows the relative size of each ‘inactive’ status opportunity in terms of time spent in the inactive state and volume of tickets affected. Typical opportunities to remove inactive bottlenecks include time spent with tickets awaiting definition (e.g. sizing), QA testing and PO sign off. Where QA queueing is considered excessive, team leads may reconsider resource allocation or enhanced collaboration upstream to maximise understanding and minimise the time tickets are waiting.

    When in project recovery mode, these metrics can be adopted by each team and related Scrum Masters, Team Leads and Delivery Managers, so that they are tracked and analysed in daily stand-ups, sprint retrospectives and management review meetings.
  • First Time Pass Rate (%) is an excellent measure of overall team health and process efficiency. As the name suggests, it measures the percentage of tickets that pass QA first time (without stimulating a return transition or defect sub-task). Too often this metric is seen as an engineering quality metric, when indeed it is a better reflection of how well a team is working together and supporting one another. A pass first-time requires the interdependent elements of an agile development team to be working well and is therefore a great team level value stream metric. A declining First Time Pass Rate is a clear red light.
  • Code Cycle Time provides a deeper understanding of your Pull Request process (from open to merged/closed), which is often found to be a key bottleneck and hence potential area to save time and increase velocity. Very significant variations in time to resolve PRs are often seen between teams and individuals, with waits of over 50 hours not uncommon in problem projects

    Figure 6 – Example Code Cycle Time metric within Plandek dashboard

  • Deployment Frequency is an important DevOps process effectiveness measure. Successful project outcomes will require the ability to develop and deploy to live small software increments rapidly. Deployment Frequency tracks that basic competence and is an important metric to understand if the root cause of a project’s problems relates to the deployment process.

    Figure 7 – Example Deployment Frequency metric view

There are a number of DevOps metrics that help explain changes in Deployment Frequency which may also be reviewed. These include Number of Builds, Build Failure Rate, Failed Build Recovery Time and Mean Time for Failed Builds which are useful to reduce the impact of build failures.  

1.6 Story management and complexity

Another area worthy of investigation is the teams' approach to managing stories and tickets. It is a common feature of problem projects that tickets are ill-defined and tend to balloon in size.  

This may be because certain individuals become siloed, working to solve engineering problems alone (which in itself is a major red flag that needs identifying and resolving).  Whatever the circumstance, overly complex stories are usually a recipe for quality and timing problem, as they not only introduce complexity for engineers but the whole delivery team through QA and release.

Relevant metrics here include therefore:

  • Story Complexity – Lines of Code per Story – lines of code/story will vary by project/situation, but experienced managers will recognise excessive lines of code per story as a major red light, especially within the context of a project that is already seen as problematic.
  • Story Complexity – Repos touched per Story – similarly code repositories touched by a story will vary according to the project context, but well-informed managers should be able to identify outliers that look worrying
  • Story Complexity - Developers per Story – again the optimal number of developers per story will vary and collaboration is always to be encouraged. But there will be circumstances where it is clear that there are too many individuals involved in a single story (often seen with junior, under-supported developers), resulting in inefficiency and illustrating the fact that the story should have been broken down into better defined elements and knowledge shared more proactively.

1.7 Defect generation and rework

It is a common scenario in problem projects that the delivery team start to generate bugs at such a rate that they generate excessive unplanned work to resolve them.  

This is therefore a key area of analysis in problem project diagnosis. Important metrics to consider include:

  • Escaped Defects can be tracked in a number of ways, but most involve tracking defects in production by criticality/priority as per the example in Figure 6.  If these are trending upwards it can be a clear sign of a problem project in that it indicates the teams may be having unforeseen engineering or testing problems, which in turn are generating unplanned work to resolve when they should be focused on the next iteration of features.
  • Similarly insightful quality metrics include P1 Resolution Time and Unresolved P1 & P2 Bugs. Both represent time expended fixing bugs thereby reducing time available for feature development and hence the delivery of project goals.

Figure 8 – Example Escaped Defects metric view

Figure 9 - Example P1 Resolution Time and Unresolved Bugs views

2 Stakeholder issue identification

2.1 Changing scope and requirements

It is very common for the root cause of a project’s problems to lie outside the responsibilities of the delivery team.  Stakeholders moving the goal posts or changing the scope during the project is a very common source of angst.

This may be hard to identify without time consuming investigation, but some simple metrics to consider here are:

  • Ex-Scope Epics Added – this metric identifies the epics that clearly represent features not originally envisaged in the agreed project scope. This can be presented as Ex-Scope Workdays Added if the average velocity is applied to the ex-scope epics added.

2.2 Delayed feedback and input

Similarly, it is common for process delays to emerge from delayed inputs from stakeholders – e.g. in the definition of user requirements in the preparation of tickets.  

This is another area which is hard to measure quantitatively, but can be assessed through sustain delays in the delivery process (e.g. in Cycle Time) or team surveys.

Summarising the project diagnostic – RAG reporting

A common frustration with problem projects is the lack of meaningful progress reporting. RAG (Red, Amber, Green) reports are often too high level to be of value – they may illustrate the problem, but do little to identify the cause of the problem and the most appropriate mitigation.

A RAG report based on the diagnostic metrics discussed in this paper becomes a much more useful tool as it highlights the underlying causes of the problem project (and hence points towards effective mitigations).  

Figure 10 below shows a template RAG report that is relatively easy to create if a metrics tool is in place.

Figure 10 – Root Cause RAG report

About the Author

After completing his degree at Cambridge University, Charlie Ponsonby started his career as an economist working on World Bank and Asian Development Bank development projects across Asia and Africa. He then worked as a strategy consultant for Andersen Consulting (Accenture), before spells in senior leadership positions at the UK retailer Selfridges and Open Interactive TV. He was Marketing Director at Sky between 2001 and 2007, before founding Simplifydigital in 2007, which was three times in the UK Tech Track 100 and grew to become the UK’s largest broadband comparison service. Following Simplifydigital’s acquisition, Charlie Ponsonby co-founded Plandek in 2017, based in London. He is married with three teenage children and lives in London.

Rate this Article