Measuring Architecture Sustainability
This article first appeared in IEEE Software magazine and is brought to you by InfoQ & IEEE Computer Society.
"There’s no single metric for architecture sustainability; such information is spread among various people and development artifacts. Morphosis is a multiperpective measuring approach for architecture sustainability that includes evolution scenario analysis, architecture compliance checking, and tracking of architecture-level code metrics."
In many application domains, software systems are maintained and evolved over decades. In industrial automation, for example, longevity is necessary because industrial devices have long life cycles. Software architectures play a major role in large-scale systems’ sustainability (that is, economical longevity), vastly influencing maintenance and evolution costs. It’s thus desirable to measure the sustainability of a software architecture to determine refactoring actions and avoid poor evolution decisions. However, it’s difficult to express a software architecture’s sustainability in a single metric: relevant information is spread across requirements, architecture design documents, technology choices, source code, system context, and software architects’ implicit knowledge. Many aspects influence economic sustainability, including the design decisions facilitating evolutionary changes, the adherence to good modularization practices, and all related technology choices. An approach that focuses on a single artifact or perspective is likely to neglect at least some of these important factors.
At ABB, we’re measuring and tracking the architecture sustainability of a large-scale, distributed industrial control system currently under development that’s based on Microsoft technologies and includes a layered architecture. A former version of the system grew to several million LOC and suffered from architecture erosion and high maintenance costs.1 We adopted a multiperspective approach called Morphosis to avoid such a situation from occurring again.2 Morphosis focuses on requirements, architecture design, and source code. It includes evolution scenario analysis, scoring of technology choices, architecture compliance checks, and tracking of architecturelevel code metrics. This article reports our experiences with tracking selected sustainability measurements over the course of two years.
Architecture Sustainability Perspectives
When we look at software architecture sustainability, we should consider multiple perspectives, including change-prone requirements, technology choices, architecture erosion, and modularizationbest practices.
Disregarding volatile requirements in architecting can lead to poor design decisions. We should design software modules around potentially changing requirements to limit ripple effects (changes to modules that trigger changes to other modules),3 which in turn will encourage system renovation through localized module replacement. For example, security requirements frequently change in industrial software systems,4 so encapsulating security-related concerns into modules can limit any related ripple effects. Thus, one measure for architecture sustainability is the degree to which an architecture is prepared for the change of volatile requirements.5
Technology choices such as chosen frameworks, third-party components, and programming languages are significant sustainability factors. A system that’s built on a fashionable but transient technology that doesn’t have long-term customer support could later require expensive renovations. For example, in the 1990s, ABB incorporated Visual Basic 6 into several products, but Microsoft discontinued its support for it, which led to costly replacements. So, in architecture sustainability, we must consider the included technologies’ expected longevity as well.6
When long-living software systems suffer from architecture erosion, the implementation violates architectural constraints such as prescribed module dependencies or separation of concerns. This situation can render a whole architecture design invalid when it becomes economically infeasible to replace modules. In that case, the architecture is no longer useful for understanding a system on a higher abstraction level. At ABB, for example, architectural analyses revealed layering violations and unwanted component dependencies in long-term, large-scale industrial software, which prompted costly architecture refactorings.1
A system implementation that starts to violate best practices for software modularization indicates decreasing architecture sustainability. Software modularization best practices could concern acyclic dependencies, layering organization, API usage, encapsulation, concern dispersion, and testability.7 Although architecture erosion usually refers to violations of an explicitly prescribed architecture, such best practices sometimes aren’t explicitly encoded in architecture documents. Numerous architecture-level code metrics have been proposed (see the sidebar “Related Work in Architecture-Level Code Metrics”), so another measure of architecture sustainability is the degree to which a software system is within desirable thresholds of such metrics.
The measures for architecture sustainability we’ve described so far primarily refer to requirements, architecture design, and source code. Additional indirect measures for architecture sustainability include documentation quality and development process maturity. Another important factor is the development organization, after which a software architecture is often modeled. Organizational changes could compromise architecture sustainability if, for example, teams working on specific modules are restructured. However, these indirect and organizational measures for architecture sustainability are out of this article’s scope.
To analyze the industrial control system at ABB, we first assessed volatile requirements and technology choices with an evolution scenario analysis and then prepared for architecture erosion by setting up architecture compliance checks. Finally, we set up a metrics dashboard to track modularization best practices in the source code.
Evolution Scenario Analysis
To analyze the architecture’s sustainability regarding volatile requirements and technology, we created evolution scenarios—changes connected to important system interfaces and components—at the architecture level. Such scenarios are valuable because we can study them for their impact on the architecture and rank them according to their likelihood to give a measure of architecture sustainability. They also let us identify sustainability sensitivity points and prepare architectural mitigation measures such as creating abstraction layers to decouple the system from expected changes.
Top-down scenarios. To elicit scenarios, we applied an extended version of the Architecture-level Modifiability Analysis (ALMA) method.5 (We ruled out another popular method, Architecture Trade-off Analysis Method [ATAM]8 because we weren’t able to conduct a workshop with all stakeholders, but instead relied on individual interviews as suggested in ALMA.) We first interviewed eight domain experts for top-down scenario elicitation according to ALMA. The domain experts all worked on similar products and had backgrounds in software engineering. In addition to providing various perspectives from their experiences with former systems, they also shed light on business and technical trends that could require changes in the system under analysis. In total, this analysis yielded short descriptions of 31 evolution scenarios, which we categorized according to multiple criteria, including the kind of change (for example, there were 13 perfective and 18 adaptive scenarios), source of change, and potentially affected subsystems. To help with prioritization, we also requested an initial ranking from the domain experts on each scenario’s likelihood and expected impact.
Bottom-up scenarios. For bottom-up scenario elicitation, we visited the development unit and conducted several interviews with managers, architects, and developers. We analyzed the requirements specification, the architecture documentation, and initial parts of the source code. We performed tasks such as tracing specific evolution scenarios to requirements to enable future impact analysis in case the requirement would be changed later. Additionally, we used dependency analyzer tools such as NDepend and CppDepend to trace evolution scenarios to the source code and reveal potential ripple effects. Thus, for several evolution scenarios we could state the directly affected amount of source code and the indirectly affected modules due to dependencies. For each evolution scenario, we documented numerous criteria such as change history, architecture sensitivity points, occurrence probability, and potential risks involved in the change.2
Results. After the top-down and bottom-up scenario elicitation, we found that seven evolution scenarios had a medium to high assessment for the combination of impact and occurrence probability over the next five years. We analyzed these seven in detail. Table 1 lists these scenarios anonymously with a condensed assessment statement. (We documented a more detailed scoring in our internal assessment report.)
(Click on the image to enlarge it)
In 2011, for example, one subsystem’s OS implementation was expected to change in the near future, so the development team had added a portability layer to the affected subsystem. However, through dependency analysis with CppDepend, we found that some parts of the code already circumvented the layer in 2011. We recommended automatically checking the code’s compliance to the portability layer in the build process. Other scenarios concerned changes in system workloads, hardware components, platforms, APIs, and GUIs. Although not all of these scenarios are highly likely to occur, we still considered some of the less likely ones because if they did occur, they would considerably impact the system. We extrapolated several of these scenarios from evolution histories of similar former systems. We reassessed the evolution scenarios from 2011 again in 2012 and 2013. We found that two of the 2011 scenarios actually occurred in 2012. The anticipated changes of three other scenarios were mitigated by corresponding test, refinement, and research activities. The remaining evolution scenarios didn’t occur but are still possible. We weren’t able to identify any additional evolution scenarios other than the ones we identified in 2011, suggesting that we were successful in covering the system’s most important potential architectural changes.
Architecture Compliance Checking
To prevent architecture erosion, we checked ABB’s system’s implementation against the prescribed architecture. The system under study has a layered architecture that allows only certain dependencies between modules. Violated dependency rules can severely increase maintenance costs because they negate the modularization benefits and complicate independent module compilability, extensibility, and testability.7 Dependency violations usually stem from developers who either aren’t aware of the prescribed architecture or work under excessive time pressure. Such violations don’t have an immediate impact on system functionality, so they’re sometimes neglected. Additionally, the architecture documentation can be out of sync with the implementation. Several tools exist for architecture compliance checking and enforcement (such as SARTool, Bauhaus, DiscoTect, Symphony, and Lattix). Once such tools are set up correctly, they can regularly check dependency rules (for example, during build time) so that violations show up early, don’t accumulate, and can be fixed during regular refactoring.
For the system under study, we based architecture compliance checks on given dependency rules defined in UML diagrams. The architects allowed dependencies between layers and modules—for example, upperlevel layers weren’t permitted to call lower-level layers, so we specified corresponding dependency rules, for example, between .NET assemblies in the declarative Code Query Language (CQL) from the tools NDepend for C# and CppDepend for C++. These tools produce dependency graphs and design structure matrices based on fact databases extracted from source code, enabling developers to visually check for violations. We also configured the development unit’s build server to check rules weekly.
Figure 1. Architecture-level metrics selected to measure sustainability concerns. We chose the set of metrics to cover complementary areas of sustainability and to avoid optimizing for a single concern.
During some of the initial analysis runs, the tools identified multiple violations of the source code’s layering structure. This was surprising because the system was in its earliest development stages. One violation was an undesired dependency from a lower to an upper architecture layer, as developers had assigned classes to the wrong modules. The development team moved the affected classes to another module and cleaned up the structure. Another violation involved classes from an upper layer directly accessing platform modules, even though the architecture prescribed routing all platform calls through an intermediate layer. Besides these issues, we also experienced dependency violations that resulted from an outdated architecture model (the developers had updated the source code during redesign but forgot to update the architecture model).
Architecture Metrics Tracking
To complement evolution scenario analysis and architecture compliance checking, we created a dashboard to track best practices for modularization using architecture-level code metrics. Clean modularization reduces system complexity, allows developers to understand the system faster, and enables easier replacement of modules during system evolution. It thus contributes to a system’s longevity.7 We applied the goal/question/metric (GQM) approach,2 broke down sustainability into modifiability, reusability and layering, modularity, and testability, and selected the subsequent metrics listed in Figure 1. The metrics in Figure 1 are formally defined in literature9,10 but aren’t widely used in practice. All metrics are normalized between zero and one, where one is the best achievable value. Some of the metrics conflict with each other—blindly optimizing for one metric leads to a decrease in the other metric. For instance, optimizing module size uniformity with only two large modules of similar sizes and poor inner structures yields worse module size boundedness.
There was no tool available to validate these metrics, so we implemented a calculation tool and reporting dashboard. NDepend and CppDepend provide basic statistics, such as LOC per module, classes, and methods, as well as raw module dependencies, stability, and modules’ abstractness indices. We added several custom CQL queries to provide intermediate results for calculating metrics. Our tool chain imports the XML output into an Excel workbook. Spreadsheets combine the advantages of user familiarity, existing data import functions (for example, from Team Foundation Server), and reporting facilities (for example, Kiviat diagrams). We calculate the architecture-level metrics with Excel macros.
(Click on the image to enlarge it)
Figure 2. Trends of architecture-level code metrics from the case study as a measure for architecture sustainability for (a) the Module Size Uniformity Index, which increased slowly over two years; (b) the Module Interaction Stability Index and Cyclic Dependency Index, showing limited variations; (c) four additional design-level metrics; and (d) the Well-Sized Methods Index, which decreased while LOC increased. The index value numbers aren’t listed because they’re confidential to the company. Instead, these graphs visually represent the index values relative to each other—higher values are better.
As of this writing, we’ve tracked the architecture-level metrics for almost two years. Figure 2 shows trends of selected metrics for the industrial systemunder study. The Module Size Uniformity Index (MSUI) penalizes heterogeneous module size. For example, a large number of very small modules generally increases system complexity and decreases understandability as observed in other long-living systems.1,7In the industrial system under study, the MSUI started on a low level, but increased slowly over the two years (see Figure 2a). The developers had defined several modules from the outset as code stubs, which had small sizes, which lowered the MSUI. However, the MSUI increased over the two years because the developers provided the implementations for the stubs. Although the MSUI hasn’t yet required restructuring actions, this index can become more important as the system grows and enters the maintenance phase.
The Module Interaction Stability Index (MISI) as well as the Cyclic Dependency Index (CDI) show only limited variations (Figure 2b). We compute these indices based on dependencies between higher-level code modules, which don’t change as often as module sizes. The MISI increased after approximately one year, when one of the modules was removed from the system, which altered the interaction stability. After some more time, the index again decreased when the developers introduced a new dependency that was negative from a structural viewpoint.
The CDI increased after one year. At that point, approximately 1,000 LOC and one cyclic module dependency were removed from the code base. Later, a new cyclic dependency was introduced, lowering the CDI again. However, both the MISI and the CDI had high values in comparison with other systems. Figure 2c provides a visualization of the trends of four additional architecture-level metrics. The State Access Violation Index shows high and stable values, meaning violations were rare in the code base. A slightly increasing trend is visible for the API Function Usage Index, which could again be traced back to the progressing state of the implementation. The Module Size Boundedness Index was rather stable with a decreasing trend. Finally, the Layer Organization Index penalizes cyclic dependencies over layer boundaries and showed a positive trend. Figure 2d shows another notable trend on code design level (the trend for one subsystem’s LOC is shown in blue) and an index measuring the percentage of well-sized methods in the code base. The developers introduced multiple large methods into the system but didn’t refactor them, which had a negative impact on how the code design level affected sustainability.
When we reassessed our evolution scenarios one and two years later, we found that some of them had actually occurred, and those that didn’t were still valid for the near future. We had to update the probabilities and impacts for some scenarios due to changing assumptions. Contrary to former studies,11 we didn’t identify new likely evolution scenarios. While some components had been replaced or extended to cover new features, the overall architecture remained stable. Therefore, we learned that the scenario analysis findings weren’t as volatile as in former studies and that the invested efforts for deeper analysis and requirements tracing paid off. The architects acknowledged that the analysis gave them a frame of reference for the technical risks they previously considered informally.
While checking the rules for architecture compliance in 2012 and 2013, we didn’t find new dependency violations in the source code. The developers now check architecture compliance regularly and fix problems accordingly after code reviews. The architecture compliance checks created a higher awareness for the architecture specification. However, maintenance remains a challenge. Currently, new dependency rules require manual specification in the CQL in addition to the specification in the UML model, which creates overhead. It’s useful to automate this step to save maintenance efforts.
Setting up our measuring dashboard for the system under study yielded some interesting effects. Some of our early reports triggered the developers to schedule respective refactoring sessions. A couple of months later, the developers had cleaned up the source code for several class-level metrics. The developers restructured all classes that had a cyclomatic complexity of 20 or more methods and additionally assigned all classes to namespaces. Thus, the code quality improved at the design level simply because a measurement instrument was in place.
Although the architecture-level code metrics haven’t yet led to major restructurings, managers and architects regularly monitor the metrics and support architecture review meetings. We learned that it’s not practical to optimize each metric to the optimal value of one. Thus, there’s still a need to define target thresholds for the metrics. Currently, stakeholders use the metrics to show relative sustainability improvement or decline. The importance of the architecture-level metrics is expected to increase once the system enters maintenance and evolves.
Our integrated approach of scenario analysis, compliance checks, and metrics tracking yielded several useful synergy effects between the activities. During scenario analysis, we used the same dependency checking tools as for the compliance checks to measure the impact of an evolution scenario and to identify ripple effects. Additionally, we used the evolution scenario analysis findings to prioritize the tracking of the dependency violations so that critical sensitivity points received more attention. We then condensed the dependency rules for compliance checking into an architectural metric and integrated it into the metrics dashboard to be considered in the context of other metrics. Overall, the approach had a perceived good cost-benefit ratio: scenarios analyses and architecture compliance checks could be set up in a matter of days. The effort for establishing a metrics framework are comparatively high (four to five person-months), but the metrics and tools can be reused later for other systems.
Architecture sustainability needs to be measured and controlled from multiple perspectives. This includes mining different sources of architecture information, ranging from humans to highly automated code reporting. Whereas focusing on scenario analysis wouldn’t prepare us for architecture erosion, focusing on architecture-level code metrics would neglect changing requirements and technologies.
Our Morphosis method for measuring architecture sustainability can be applied with manageable efforts. Still, we can only plan the evolution of a complex software system to a limited extent owing to rapid changes in the IT world. In addition to methods focusing on technical factors, organizational and human factors are also important. For short-term future work, we plan to refine Morphosis by applying it to other ABB systems, which could improve the scenario analysis templates and lead to a revised set of architectural metrics. We will also compare our findings on architecture metrics tracking to current empirical studies to gain an overall understanding of their usefulness.12 We still need longitudinal studies that correlate software maintenance costs with the architectural metrics to enable quantitative cost-benefit analyses. Another direction to explore is augmenting Morphosis with constructive methods for planning software architecture evolution.13
Related Work in Architecture-Level Code Metrics
Most work in the area of architecture-level metrics is derived from the module concept described by David Parnas1 and the notions of coupling and cohesion.2 We categorized more than 40 architecture-level code metrics, the full list of which is available elsewhere.3
John Lakos defined a metric called Cumulative Component Dependency (CCD), which is the sum of required dependencies by a component within a subsystem.4 CCD provides a numerical measure of the module coupling in a system, where low values represent better maintainability and testability. Derived metrics are the average component dependency and the normalized CCD. They can be determined via tools such as SonarJ or STAN.
Robert Martin defined several metrics for software packages, or groups of related classes (such as Java packages and C++ projects). 5 These include afferent coupling, efferent coupling, abstractness, instability, distance from main sequence, and package dependency cycles. For example, distance from the main sequence measures how usable and maintainable a module is. Several tools support these metrics, including CppDepend and STAN.
Santonu Sarkar and his colleagues created a set of 12 APIbased and information-theoretic metrics for modularization quality. 6 The metrics rely on the definition of APIs between modules, module size thresholds, and concept term maps.
Raghvinder Sangwan and his colleagues introduced the complexity measurement framework Structure 101, which uses a metric called Excessive Structural Complexity.7 This is computed as the product of the degree of cyclic dependencies violations and a multilevel complexity metric, which can also be determined on the package or module level.
Finally, Eric Bouwers and his colleagues proposed a metric called component balance, which combines the number of components and their relative sizes.8 This research group evaluated the usefulness of their metrics on a large number of industrial systems, which is also an interesting direction for future research.
1D.L. Parnas, “On the Criteria to Be Used in Decomposing Systems into Modules,”Comm. ACM, vol. 15, no. 12, 1972, pp. 1053–1058.
2W.P. Stevens, G.J. Myers, and L.L. Constantine, “Structured Design,” IBM Systems J., vol. 13, no. 2, 1974, pp. 115–139.
3H. Koziolek, “Sustainability Evaluation of Software Architectures: A Systematic Review,” Proc. 7th ACM SIGSOFT Int’l Conf. Quality of Software Architectures (QoSA 11), ACM, 2011, pp. 3–12.
4J. Lakos, Large-Scale C++ Software Design, Addison-Wesley, 1996.
5R.C. Martin, Agile Software Development: Principles, Patterns, and Practices, Prentice Hall, 2003.
6S. Sarkar, G.M. Rama, and A.C. Kak, “API-Based and Information-Theoretic Metrics for Measuring the Quality of Software Modularization,” IEEE Trans. Software Eng., vol. 33, no. 1, 2007, pp. 14–32.
7R.S. Sangwan, P. Vercellone-Smith, and P.A. Laplante, “Structural Epochs in the Complexity of Software over Time,” IEEE Software, vol. 25, no. 4, 2008, pp. 66–73.
8E. Bouwers et al., “Quantifying the Analyzability of Software Architectures,”Proc. 9th Working IEEE/IFIP Conf. Software Architecture (WICSA 11), IEEE CS, 2011, pp. 83–92.
About the Authors
Heiko Koziolek is a principal scientist with the Industrial Software Systems program at ABB Corporate Research Germany. His research interests include performance engineering, software architecture, model-driven software development, and empirical software engineering. Koziolek received a PhD in computer science from the University of Oldenburg. Contact him at firstname.lastname@example.org.
Dominik Domis is a scientist at ABB Corporate Research Germany. His research interests include architectural approaches for the integration and systematic reuse of industrial software systems. Domis received a PhD in computer science from the University of Kaiserslautern. Contact him at email@example.com.
Thomas Goldschmidt is a principal scientist at ABB Corporate Research Germany. His research interests include domain-specific language engineering and software architectures in the automation domain. Goldschmidt received a PhD in computer science from the Karlsruhe Institute of Technology. Contact him at firstname.lastname@example.org.
Philipp Vorst is a scientist in the Industrial Software Systems program at ABB Corporate Research Germany. His research interests include software architecture methods with applications in automation. Vorst received a PhD in computer science from the University of Tübingen. Contact him at email@example.com.
1T. Kettu et al., “Using Architecture Analysis to Evolve Complex Industrial Systems,” Proc. Workshop Architecting Dependable Systems V, LNCS 5135, Springer, 2008, pp. 326–341.
2H. Koziolek et al., “MORPHOSIS: A Lightweight Method Facilitating Sustainable Software Architectures,” Proc. 2012 Joint Working IEEE/IFIP Conf. Software Architecture and European Conf. Software Architecture, IEEE CS, 2012, pp. 253–257.
3D.L. Parnas, “On the Criteria to Be Used in Decomposing Systems into Modules,” Comm. ACM, vol. 15, no. 12, 1972, pp. 1053–1058.
4D. Dzung et al., “Security for Industrial Communication Systems,” Proc. IEEE, vol. 93, no. 6, 2005, pp. 1152–1177.
5P. Bengtsson et al., “Architecture-Level Modifiability Analysis (ALMA),” J. Systems and Software, vol. 69, nos. 1–2, 2004, pp. 129–147.
6A. Jansen, A. Wall, and R. Weiss, “TechSuRe: A Method for Assessing Technology Sustainability in Long Lived Software Intensive Systems,” Proc. 37th EUROMICRO Conf. Software Eng. and Advanced Applications, IEEE CS, 2011, pp. 426–434.
7S. Sarkar et al., “Modularization of a Large-Scale Business Application: A Case Study,”IEEE Software, vol. 26, no. 2, 2009, pp. 28–35.
8P. Clements, R. Kazman, and M. Klein, Evaluating Software Architectures: Methods and Case Studies, Addison-Wesley Professional, 2001.
9R.C. Martin, Agile Software Development: Principles, Patterns, and Practices, Prentice Hall, 2003.
10S. Sarkar, G.M. Rama, and A.C. Kak, “APIBased and Information-Theoretic Metrics for Measuring the Quality of Software Modularization,”IEEE Trans. Software Eng., vol. 33, no. 1, 2007, pp. 14–32.
11N. Lassing, D. Rijsenbrij, and H. van Vliet, “How Well Can We Predict Changes at Architecture Design Time?,” J. Systems and Software, vol. 65, no. 2, 2003, pp. 141–153.
12E. Bouwers, A. van Deursen, and J. Visser, “Evaluating Usefulness of Software Metrics: An Industrial Experience Report,” Proc. Int’l Conf. Software Eng. (ICSE 13), IEEE, 2013, pp. 921–930.
13D. Garlan et al., “Evolution Styles: Foundations and Tool Support for Software Architecture Evolution,” Proc. Joint Working IEEE/IFIP Conf. Software Architecture (WICSA 09), 2009, IEEE, pp. 131–140.
This article first appeared inIEEE Software magazine.IEEE Software's mission is to build the community of leading and future software practitioners. The magazine delivers reliable, useful, leading-edge software development information to keep engineers and managers abreast of rapid technology change.