InfoQ Homepage Articles The Holistic Approach: Preventing Software Disasters

The Holistic Approach: Preventing Software Disasters

Apr 28, 2016 10 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

When it comes to protecting a business against software catastrophes, most CIOs dump vast sums into traditional IT testing. In some IT shops, testing can even account for as much as 30 percent of the total application development and maintenance spending. Yet despite this spending spree, devastating software glitches continue to make headlines weekly, almost becoming a daily occurrence as digital transformations expose more and more IT systems to the world via online portals, mobile apps and more.

These ongoing system crashes point to a discrepancy in the value and effectiveness of IT testing methods currently used by CIOs and IT administrators.

To the majority of the IT community, testing is mainly about functional and load testing, both of which focus on the run-time behavior of software. However, Margo Visitacion, Forrester’s VP and principal analyst, points out that “Functional testing tools are not enough” Indeed, about one third of the glitches experienced by end users are not functional but are directly rooted in structural flaws that are not revealed by unit testing alone.

Structural flaws are different, just as muscle pains are different from tired legs. They require a holistic diagnostic, not simply a localized massage. Most testing teams focus on functional testing rather than technical methods, so their tools do not account for software architecture and source code. This allows structural flaws to go undetected, left to plague internal operations and external entry points that support customer service functions.

The Symptoms: True Stories, True Examples

Structural analysis is no simple issue, and Global 2000 companies remain at odds with incumbent practices and legacy systems that must be optimized for today’s digital economy.

For example, a large U.S. investment bank recently suffered a series of severe outages followed by a tricky data corruption issue. To recover, it took weeks and millions of dollars. Following investigation, it was found that C++ and Java front-end programs (both with embedded and stored SQL) were making transactional update-insert-delete operations on critical data without appropriate rollback mechanisms. A series of unfortunate events, including an inappropriate attempt to fix the issue ‘on the fly’, crashed the system. This type of event is impossible to detect through functional testing, and it took weeks to realize that something wrong had happened with the data.

In the transportation industry, a large European company sustained a crash of their online reservation system, leaving millions of users stranded and subjecting the company to national ridicule.

According to Quality Assurance tests, functional tests and intensive load testing, no problem was detected, and the company invested in upgrading its hardware. After spending millions of Euros in equipment upgrades, the Oracle Database Administrator (DBA) team discovered the root cause. A hard-to-inspect database table was aggregating big data, stats and history information. This resulted in exponential growth of the database, which was expanding by 10x a week for several months, eventually crashing the system. While not the most common cause of a systems crash, the problem was a simple one that could have been easily fixed if detected early on.

In another instance, a giant consumer services company discovered a high volume of customers were abandoning their shopping carts without making purchases. The company was based on a service-oriented architecture and a hastily revamped legacy back-end system which caused excessive latencies, particularly during holiday shopping periods. Despite all tests returning no results, a structural investigation showed that the front-end Javascript layer was problematic. The customer-facing portal was accessing individual Java services and triggering inquiries on legacy code through CICS transactions, rapidly consuming the entire buffer time. This debacle is a perfect example of small details hindering “big picture” software performance, and ultimately customer satisfaction.

ERP has long been the source of many headaches for businesses but in this example, a manufacturing company had to dig itself out of an enormous hole. After an important systems upgrade, it was no longer able to perform all of its back-office batch operations overnight. It turns out that a set of programs that had been developed specifically for this company, on top of the previous version of the ERP, were cumulating several bad practices. In addition, they were bypassing the regular API provided by the ERP to access the managed data. It took several days for the IT Department to identify the source of the performance issue and several days more for the outsourcer to accept the issues as related and then fix them. Had they been able to get a big picture of the transactions performed by the set of programs, diagnosing and fixing the root problem would have been much easier.

The Diagnosis: Prevention.

All of the examples discussed above could have been avoided. To use a simple analogy, the structural resiliency, efficiency and safety of a brick building depends on three parameters:

The quality of the bricks that make up the building
The quality of the “construction,” provided the overall architecture has been properly designed and enforced
Exposure to the elements - human interaction, business changes, weather, natural disasters, etc.

Obviously, the intrinsic quality of the bricks matters less than the overall quality of the construction. Even the best bricks won’t prevent a wall to collapse if improper forces are applied. The same thing is true in software engineering. Richard Soley, a well-known software engineer and CEO of the Object Management Group, explains:

The code quality of an individual software ‘brick’ accounts for less than 10% of the structural defects occurring in production. The way the bricks are held together, mostly to resist assault from unexpected events or abnormal end-user behaviors, represents 90% of the defects damaging the business.

The same findings were confirmed in a software engineering study published by the Empirical Software Engineering journal in 2011.

The good news is that appropriate resiliency, efficiency, security, data integrity and safety measures for complex IT systems have been standardized by the Consortium for IT Software Quality (CISQ) over the past eight years.

CISQ, founded by the Software Engineering Institute (SEI) at Carnegie Mellon University and the Object Management Group (OMG), recently saw its standards approved by OMG. They cover more than 80 system-level checkpoints under the “Good Architectural Practices at System Level” umbrella.

Taking a cue from CISQ, CIOs would do well to set up “structural testing gates” that mirror the functionality of a modern X-Ray machine, seeing through complex software systems to identify structural flaws not visible to the naked eye.

How does the holistic X-Raying software work?

X-Ray technology was invented to let scientists and doctors see behind our skin. And in the very same way, structural software analysis enables CIOs and IT administrators to see “behind the surface” of source code. Structural analysis is leveraging static analysis, combined with data flow analysis and other technologies.

Structural source code analysis must cover at least:

source code at the front-end (including Javascript, HTML and CSS) in addition to legacy applications using Java Swing GUI, .NET WPF and Visual Basic
source code that supports the business logic (including Java, .NET, C/C++ and Cobol)
source code that manages the persistence of the data (this includes custom code embedded in the business logic or database that leverages extensive SQL stored procedures and functions)
scripts that describe the data structure (including DDL and DML)
configuration scripts and property files for frameworks and web services

Understanding each kind of source code and scripts, interpreting the configuration files, evaluating the value of variables throughout the execution cycle for finally piecing all these findings together and reverse-engineering the system blueprint gives CIOs an “X-Ray view” into the inner workings of their organization’s software systems and empowers the CIO to make data-informed decisions to fortify overall software quality. This is very different than the traditional use of static analysis in quality control oriented tool that analyzes each component and spits out a set of best practice violations.

To achieve structural analysis, each source code type still needs to be analyzed with dedicated static source code analyzers. For example, language and technology aspects should be measured individually, since the concepts and programming techniques are not implemented identically across languages. For example, Lambda expressions recently introduced in Java 8 are different from Lambda expressions in C#.

But looking at the unit-level source code is not everything, it’s just the beginning - specifically with modern architectures where loose coupling between the different layers is a must. Hence CIOs must also X-Ray the “glue” between software layers and components, which is sometimes defined in configuration, property files or annotations stored directly inside the source code files.

Sometimes this structural view needs to be derived from information stored in a database or it can depend on users' input.

Examples

For example, when implementing Struts, one has to comply with some constraints – such as services have to inherit from the Action class; mapping between derived Action classes and the references used in the presentation layer have to be defined in a configuration file; and, the configuration file, which is usually called struts-config, can be renamed to web.xml and so forth.

To be able to properly understand interactions between different layers of the application, source code analysis must consider each element separately. The same rings true for ERP systems like SAP. Depending on the value stored in specific tables, a process can trigger different technical components – all of which need to be “X-Rayed” and analyzed by IT.

Holistic source code analysis also helps to unite the many development teams IT-intensive organizations employ. For example, the front-end team might be separated from the business logic team since it requires a different technical skill set, and a centralized DBA team usually manages the database structure.

Results

The holistic X-Raying software should be used when the application is ready for integration testing. Similar to an integration test, the holistic X-Raying will prove useful when at least the set of components that support a given functionality in the application are deemed ready. Teams working in a SCRUM approach will trigger holistic X-Ray at least at the end of each sprint, and even better a couple of times during the sprint once a user story is marked done. This makes sure they are on the right track and that they won’t have to come back to this user story “after hours” thus avoiding the creation of waste. The teams that are still practicing waterfall will probably get their first holistic X-Ray much later but still will benefit since they usually have a significant stabilization phase.

Once a holistic X-Ray is complete, development teams will be left with a list of possible structural flaws and, even more importantly, the list will be organized by transactions (a set of source code elements involved between a user entry and its final destination – usually a table in a database or a file). Depending on the technical context and expectations (timeline, risks) some of these structural flaws will be tolerated in production, while others will have to be fixed before shipping. In both cases, this makes the development team more aware and accountable, which is always a good thing when it comes to risk management.

From an IT budget perspective, holistic source code analysis is a no-brainer. For a fragment of the total cost of functional testing, and for a potentially much bigger positive impact, the payback is huge: prevention of software catastrophes, higher end-user performance, and increased customer satisfaction. Going back and forth between QA and development with a list of critical violations will eventually lead to a continuous improvement of the engineering quality of the deliverables.

In today’s world, where everything gets X-Rayed, from our luggage before boarding a plane to the welding joints of nuclear reactors, the time has come for CIOs to take hold of their own X-Ray tool: source code analysis.

About the Author

Olivier Bonsignour is an Executive Vice President at CAST. Responsible for Research & Development and Product Management, he is an expert in the precision software parsing, the software analysis industry and is a pioneer in the development of distributed systems and object-oriented development. Prior to CAST, Bonsignour was the CIO for DGA, the advanced research division of the French Ministry of Defense. Most recently, he co-authored the book “The Economics of Software Quality,” Olivier holds a graduate degree in engineering and computer science from the National Institute of Applied Sciences (INSA), Lyon, and a Master’s in Management from the executive program at IAE Aix-en-Provence.

InfoQ Software Architects' Newsletter

The Holistic Approach: Preventing Software Disasters

Write for InfoQ

Related Sponsors

The Symptoms: True Stories, True Examples

The Diagnosis: Prevention.

How does the holistic X-Raying software work?

Examples

Results

About the Author

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter