Google's Study Provides Insights into Programmers' Build Errors
Google engineers have recently published a research paper presenting an empirical study of 26.6 million builds produced during a period of nine months by thousands of developers at Google. The paper describes the build workflow, and analyzes failure frequency, compiler error types, and resolution efforts. According to the authors, the results provide insights useful to understand how the build process works in large organizations and how developer could be more effectively supported.
The paper authors define their study "quite novel" for its approach to characterizing how programmers in industry interact with their compiler and build tools. Furthermore, they stress the importance of the build process, which is a central step within the “edit-compile-debug” cycle:
Slow compiles may cause the programmer to be distracted by other tasks or lose context. [...] Any delay widens the gap between the programmer deciding on the next change to perform and viewing the effect of that change. Keeping the build process fast and understanding when and how it fails is a key part of improving programmer productivity.
The researchers try to answer a few questions based on the analysis of four metrics:
- Number of builds performed by each developer.
- Build failure ratio.
- Number of errors for each error kind.
- Time a developer spends to fix the error.
How often do builds fail?
The build failure ratio resulted "approximately normally distributed, with the median percentage of build failures higher for C++ (38.4%) than Java (28.5%)." The researchers attribute differences across languages, at least in part, to IDE usage as most JAVA developers benefit from the IDEs’ built-in checks.
"Developers with either very low or very high failure rate seem to be rare," and in both cases they seem not to be regular contributors in the specific language or projects.
No strong correlation was found between build counts and build failure ratio, so this would rule out the hypothesis that developers who build more frequently may have a higher failure ratio.
No correlation was found either between developer experience and build failure ratio, in part "may be due to the difficulties in precisely characterizing experience or expertise."
Why do builds fail?
The paper identifies a large number of build errors and measure their frequency, as shown in figure 1 (click to enlarge).
Results are further classified into five categories: dependency, type mismatch, syntax, semantic, other. The category distribution is shown in figure 2.
Dependency-related errors are the most common error type for both C++ (52.68%) and Java (64.71%). C++ showed more syntax errors than Java. Once again the authors found this consistent with the greater IDE usage for Java.
How long does it take to fix builds?
Overall, it was found in the study, the median resolution time of build errors were 5 and 12 minutes for C++ and Java, respectively.
Those times can vary by an order of magnitude across error kinds, though. It also appears that C++ resolution time is less than Java, on average, although some C++ build errors show a higher median resolution time due to them being more difficult to figure out.
In terms of build attempts until the errors are fixed, it comes out that 75% of build errors are resolved within at most two builds for all of the 25 most common error kinds for both Java and C++.
Findings and Implications
The key takeaways of this study, as identified by the authors are the following:
Independent of programming language, approximately 10 percent of the error types account for 90% of the build failures.
Dependency errors are the most common.
On average it takes one build iteration to fix a build error, and most errors are fixed in two build iterations.
These results, conclude the researchers, are useful to both practitioners and tool builders.
For practitioners, it provides a means to identify areas where additional expertise, tool usage, or development activity (e.g., reducing dependencies) may be most beneficial.
On the other hand, "better tools to resolve dependency errors have the greatest potential payoff." Similarly, the quantification by error message and type can help a compiler team to identify error messages that could be revisited to improve their meaningfulness to developers.
On a final note, it is worth to notice that this study, as any study, has a limited validity. The paper authors identify the following factors that could threat its validity:
- The study is performed within a single company with particular processes, constraints, resources, and tools. Still, the magnitude of the study in terms of the number of builds, developers, and systems involved provides a valuable baseline for the community.
- The study focuses on two languages, C++ and Java.
- Finally, choices regarding data gathering, error classification, mapping errors to a taxonomy, and the cutoffs to remove noise from the data could also reduce its applicability.
The study was conducted by Google engineers Caitlin Sadowski, Edward Aftandilian, and Robert Bowdidge in collaboration with Hong Kong University researcher Hyunmin Seo, and University of Nebraska researcher Sebastian Elbaum.