Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Validation of Autonomous Systems

Validation of Autonomous Systems

Key Takeaways

  • Autonomous and automated systems are increasingly being used across domains.
  • Yet, the distrust in their reliability is growing due to lack of transparency, which will reduce acceptance and slow down usage.
  • Autonomous systems have complex interactions with the real world which raises many questions about the validation, such as how to trace back decision-making and judge afterwards about it, how to supervise learning, adaptation, and especially correct behaviors, and how to define reliability in the event of failure.
  • Different validation methods for autonomous systems exist with their strengths and weaknesses and context where they are effective and/or efficient.
  • Classic validation methods are relevant for coverage in defined situations but must be supplemented with cognitive test methods and scenario-based testing.

Autonomous Systems

Society today depends on autonomous systems. Often, we do not even recognize them which is the ultimate proof of the Turing test. They are increasingly being used in IT such as finance, but also transport, medical surgery, and industry automation [1]. The potential of automated and autonomous systems is enormous; for example, the use of autonomous mobility systems will eliminate up to 90% of accidents and reduce up to 50% of commuting time per user per day [2].

Yet, the distrust in their reliability is growing. There are many open questions about the validation of autonomous systems: How to define reliability? How to trace back decision-making and judge afterwards about it? How to supervise correct behavior to achieve trust and control learning? Or, how to define liability in the event of failure? The underlying algorithms are difficult to understand and thus not transparent.

Traditional validation methods are complex, expensive, and therefore expensive [1,2,3]. In addition, no repeatable effective coverage for regression strategies for upgrades and updates is available, thus limiting software updates and continuous deployment. Functional safety requirements further impact approval of such systems on top of IT constraints.

Figure 1 indicates the five steps from automation to autonomy as we also know it from human learning where we advance from novice to expert [1,5]. Those steps exemplify the way of a simple and “assisted behavior” in terms of low-level sensing and control towards “full cognitive systems” with a very high degree autonomy.

Fig. 1: From assisted systems to fully automated (autonomous) systems

For instance, a completely autonomous vehicle on level 5 is expected to drive with no human intervention even in dire situations [4,5]. This implies that the vehicles must have intelligence at par or even better than humans to handle not just the regular traffic scenarios, but also the unexpected ones. Although several players such as Google and Uber are granted permission to operate their self-driving services, incidents such as the death of Elaine Herzberg put our faith in these vehicles to a test [1,3,4]. It is therefore quite apparent that existing validation measures aren’t enough! We need new test methods that can envision fatal traffic situations that humans haven’t encountered yet. In addition, testing cannot simply be isolated to final stages. It must be part of every stage in product lifecycle. Hence, a sensible engineering process must be adopted in developing autonomous systems that lays enough emphasis on verification and validation.

Unlike an automated system which cannot reflect the consequences of its actions and cannot change a predefined sequence of activities, an autonomous system is meant to understand and decide about how to execute based on its goals, skills and a learning experience.

The robo-test incubator at University of Stuttgart has created a pilot environment to derive viable industry solutions [4]. With cognitive testing methods, situational awareness, and appropriate indexing, the test efficiency can be enhanced. The conclusion: systematic validation needs to integrate classic coverage with scenario-based testing and novel cognitive test methods.

This article introduces validation and certification as well as the general approval (homologation) of autonomous systems and their components. We provide insights into the validation of autonomous systems, such as those used in automation technology and robotics. We give an overview of methods for verification and validation of autonomous systems, sketch current tools and show the evolution towards AI-based techniques for influence analysis of continuous changes.

Validation of Autonomous Systems

Autonomous systems have complex interactions with the real world. This raises many questions about the validation of autonomous systems: How to trace back decision making and judge afterwards about it? How to supervise learning, adaptation, and especially correct behaviors – specifically when critical corner cases are observed? Another challenge would be how to define reliability in the event of failure.

With artificial intelligence and machine learning, we need to satisfy algorithmic transparency. For instance, what are the rules in an obviously not anymore algorithmically tangible neural network to determine how an autonomous system might react with several hazards at the same time? Classic traceability and regression testing will certainly not work. Rather, future verification and validation methods and tools will include more intelligence based on big data exploits, business intelligence, and their own learning, to learn and improve about software quality in a dynamic way.

Fig. 2 provides an overview on validation technologies for autonomous systems. We distinguish horizontally the transparency of the validation method. Black box means that we have no insight to the method and coverage, while white box provides transparency. The vertical axis classifies to the degree we can automate the validation techniques and thus for instance facilitate regression strategies with software updates and upgrades. Obviously today many such validation techniques are already in use, such as automatic function test. The grey boxes show these more traditional techniques. The red upper right box is our focus, namely how to run efficient black-box validation to achieve efficiency and transparency as would be necessary for trusted systems.

Fig. 2 Validation Technologies for Autonomous Systems

Various new intelligent validation methods and tools are evolving which can assist in a smart validation of autonomous systems. Verification and validation depend on many factors. Every organization implements its own methodology and development environment, based on a combination of several of the tools presented in fig. 2. It is however relevant to not only deploy tools, but also build the necessary verification and validation competences. Too often we see solid tool chains, but no tangible test strategy. To mitigate these pure human risks, software must increasingly be capable to automatically detect its own defects and failure points.

Fig. 3 provides a complete evaluation on static and dynamic validation technologies for autonomous systems. We have mentioned some tools in this, but they are to be an impulse, rather than a complete list or even a recommendation. Every company today implements its own methodology and development environment. Too often one sees ambitious development teams, complex tool chains, but no tangible sustainable testing strategy.

Fig. 3 Validation Technologies for Autonomous Systems

Positive testing methods aim to ensure that all the functional requirements of the system are taken care of, while the negative testing methods ensure that the system is tested for quality requirements [9]. Negative requirements (such as safety and cyber-security) are typically implied-requirements and are not explicitly specified in system requirement specifications.

Fault Injection. Fault injection techniques make use of external hardware to inject faults into target system’s hardware. Faults are injected either with or without direct contact with physical hardware. By having direct contacts, faults such as forced current addition, forced voltage variations etc. can be injected to observe the behavior of the system. From a security perspective faults can be injected by interfaces, direct access to components, but also software manipulations and sensor frauds.

Functionality-based test methods categorize the intelligence of a system into three categories: 1. Sensing functionality, 2. Decision functionality and 3. Action functionality. The idea behind such methods is that the autonomous system should be able to retrieve various functionalities for a given task analogous to human beings. For example, an autonomous mobility system should be able to recognize other vehicles, obstacles, pedestrians, etc. for vision-based functionality. Combinations of these recognized objects can then act as inputs to decision functionality and several decisions can then lead to actions. Functionality-based testing therefore breaks down the scenarios into various functional components which can be tested individually.

Simulators are closed indoor cubicles, which act as substitute to physical systems. These simulators can simulate the behavior of any system either by using physical hardware or by using the software model. The behavior of driver can then be captured by feeding him simulated external environment. Since the simulators employ hydraulic actuators and electric motors, the inertial effects generated feel nearly the same as real system. They are used for robots in industrial automation and surgery planning in medical, train systems and automotive.

Nothing can come close to the real world than the real world itself. This is perhaps the final validation phase where completely ready system is driven out into real roads with real traffic. The sensors data is recorded and logged to capture the behavior in critical situations. It is then later analyzed to accommodate and fine tune the systems according to real word scenarios. The challenge in this stage however lies in the sheer amount of test data that is generated. A stereo video camera alone is found to generate 100 gigabytes of data for each minute of operation. In such situations, big data analysis becomes extremely important.

Autonomous systems’ approval or homologation (in case of safety-critical systems) therefore requires regressive validation, i.e., a test that, after changing the control algorithms, performs a new check and ensures the function. Thus, safety and reliability can be obtained both in development, testing and in use, even when the system adapts, i.e. is changed.

Truly transparent validation methods and processes become of an uttermost relevance and will be challenged by the progress of technology over the five sketched steps towards autonomous behavior. Although still relevant, traditional validation methods aren’t enough to completely test the growing complexity of autonomous systems. Machine learning with situational adaptations and software updates and upgrades demand novel regression strategies.

Intelligent validation techniques tend to automate complete testing or certain aspects of testing (fig. 4). This eliminates the potential errors associated with manual derivations of test cases since humans may fail to derive or think about certain scenarios. It also eliminates the enormous amount of time that needs to be invested to derive the test cases.

Fig. 4 Test Environment with Indexing, Test Selection, Simulation Engine and AV KPI

Cognitive Testing

With artificial intelligence and machine learning, we need to satisfy algorithmic transparency. For instance, what are the rules in an obviously not anymore algorithmically tangible neural network to determine who gets a credit or how an autonomous system might react with several hazards at the same time? Classic traceability and regression testing will certainly not work. Rather, future verification and validation methods and tools will include more intelligence based on big data exploits, business intelligence, and their own learning, to learn and improve about software quality in a dynamic way.

Cognitive test procedures are based on a database that transparently depicts scenarios and disruptions, so that a target behavior for critical situations, boundary conditions, etc. is defined. In the signal path, signals are generated from the scenarios for the interfaces of the autonomous system or its components. For example, if a person suddenly appears in front of an autonomous robot, the reaction of the system and the action of its components must be deterministically tested. These testing can be simulations for camera and radar sensors, but also communication signals, correlations of functions, corner cases, and the display of disturbances. By factorized parameterization of corner cases, such as different lighting conditions, can be systematically synthesized. From the behavior of the system under test, actual rules are extracted, which are compared with the expected target behavior. The automatically extracted actual rules are compared with known and accepted target rules as to how the system under test should behave in the scenario.

The target rules are derived from laws, experiences, human expertise, guidelines from ethics committees but also from simulations. They should be transparent and therefore accessible to human testing. Specifically, they should cover corner cases – which is exactly the advantage compared to brute force [1,5]. Today there are already mechanisms to achieve transparence in machine learning, such as extracting fuzzy rules [1]. The challenge remains to use these typically hard-to-understand rules for a more systematic validation as is required in a safety case and in homologation. Rules are extracted from the behavior of the autonomous system under test to make transparent the learned intransparent behavior stored in implicit rules or neuron links. These now transparent but quite fuzzy rules are compared with the set rules in behavior. Validation and certification are based on the control deviations [1,3,4,5,6].

Fig.5 gives an overview of the cognitive testing we are currently using for networked components of autonomous systems. Unlike brute force, the dependencies between the white box and the black box are considered, bringing efficiency and effectiveness into line. Autonomous systems consist of the interaction of many components, such as controllers, sensors and actuators, which are distributed in the system. In a distributed overall system, undesirable behavior and basic malfunctions can arise because there has been a software change at one point that breaks through to other components. This raises numerous questions: how can the function of a system be ensured if changes take place in the subcomponents? How can the safety and reliable behavior be guaranteed if software changes are made to individual components during operation?

A key question is in which way an artificially intelligence can support the process of validation. Obviously, there are several AI approaches for testing already in use, ranging from rule-based systems, fuzzy logic, Bayesian nets to the multiple neural network approaches of deep learning [1,6]. However, the process of validation of an autonomous system is much more challenging than testing classic IT systems. Autonomous systems are adaptive, multilayered, and rich in (mostly hidden) details. Various levels of validation tests can be distinguished, such as the systems level, the components or modules.

The potential for an intelligent testing is manifold [4,5,6]: On a system level there are questions on which test cases must be executed, and to what extent? This means an intelligent validation is requested to help in terms of selection or even creation of test cases for validation. In a first step, an assistance functionality helps to identify priorities in an existing set of cases. As a result, the validation expert can test quicker and with a better coverage of situational relevant scenarios. On the level of a component or module testing it is also required to identify relevant cases. This can range from a simple support on how to feed the system with adequate inputs and check on the outputs to complex algorithms which automatically create test cases based on the code or user interface. Fig. 4 provides an overview on intelligent testing as we currently ramp up for autonomous systems. Unlike brute force, it considers the dependencies both white box and black box and thus balances efficiency and effectiveness. See the sidebar for a concrete case study.

Fig. 5 Dependency-oriented test case selection for cognitive regression test


Trust in autonomous systems depends on their validation. With the growing importance, and hence the concerns of users and policymakers regarding the impact of autonomous systems on our lives and society, software engineers must ensure that autonomous functions and systems function reasonably well and properly. To build trust, the quality of the technical system is expected to be at least an order of magnitude higher than that of human-powered systems.

Building trust is closely linked to issues of validation. However, such validations depend on many factors. Autonomous systems provide efficiency and safety by relieving the operator of tedious and error-prone manual tasks. The question "Can we trust autonomous systems?" will continue to grow in the coming years. Public trust in autonomous systems depends heavily on algorithmic transparency and continuous validation.

An accident caused by software errors is discussed more intensively today than the many accidents caused by alcohol. On the other hand, current software errors with deaths in aviation also show a certain "habituation". The number of passengers does not decrease because of crashes, as everyone knows that the aircraft are altogether safely developed.

This learning curve of acceptance can be seen in all autonomous systems, historically for example in smartphones, bots with automatic speech processing and in social networks. An increasingly informed society accepts that while software is never error-free and so there is a residual risk, there are still many advantages over the past.

AV validation must be efficient, transparent, and reliable. This article addresses this magic triangle and thus the question "Can we trust autonomous systems?” Public trust in autonomous systems depends heavily on algorithmic transparency and continuous validation. The art of systematic AV testing is to consider the un-thinkable. Murphy’s law gives us the direction: “Everything that can possibly go wrong will go wrong.” It might sound pessimistic, but this is exactly the attitude necessary for effective and transparent testing.


  1. Ebert, C. and M. Weyrich: Validation of Autonomous Systems. IEEE Software, ISSN: 0740-7459, vol. 36, no. 5, pp. 15-23, Sep 2019.
  2. Kalra, N., and S. M. Paddock: Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Re-search Part A: Policy and Practice. Vol.  94, pp. 182-193, 2016.
  3. Rodriguez, M., M. Piattini, and C. Ebert: Software Verification and Validation Technologies and Tools. IEEE Software, ISSN: 0740-7459, vol. 36, no. 2, pp. 13-24, Mrc. 2019
  4. robo-test at University of Stuttgart., continuously enhanced. Last accessed: 6. Jun. 2021.
  5. Ebert, C., M.Weyrich et al: „Systematic Testing for Autonomous Driving“. In: ATZ electronics worldwide. Last accessed: 6. Jun. 2021.
  6. Shalev-Shwartz, S. et. At.: On a Formal Model of Safety and Scalable Self-Driving Cars. Intel,, continuously enhanced. Last accessed: 6. Jun. 2021.
  7. Ebert, C.: Requirements Engineering. dPunkt, 6. Edition, 2019.

About the Authors

Christof Ebert is the managing director of Vector Consulting Services and professor at the University of Stuttgart and the Sorbonne in Paris. Contact him at


Michael Weyrich is the director of the University of Stuttgart’s Institute for Automation and Software Systems. Contact him at

Rate this Article