Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles The Case for Explainable AI (XAI)

The Case for Explainable AI (XAI)

Key Takeaways

  • Artificial Neural Networks offer significant performance benefits compared to other methodologies, but often at the expense of interpretability
  • Problems and controversies arising from the use and reliance on black box algorithms have given rise to increasing calls for more transparent prediction technologies
  • Hybrid architectures attempt to solve the problem of performance and explainability sitting in tension with one another
  • Current approaches to enhancing the interpretability of AI models focus on either building inherently explainable prediction engines or conducting post-hoc analysis
  • The research and development seeking to provide more transparency in this regard is referred to as Explainable AI (XAI)

Modern machine learning architectures are growing increasingly sophisticated in pursuit of superior performance, often leveraging black box-style architectures which offer computational advantages at the expense of model interpretability. 

A number of companies have already been caught on the wrong side of this “performance-explainability trade off”.

Source: DARPA

In August 2019, Apple and Goldman Sachs co-launched a credit card poised to disrupt the market and offer consumers a sleek, next-gen payment experience. Controversy struck almost instantly when users noticed that women were being offered significantly smaller credit lines than men, even within couples who filed taxes jointly. Despite assertions by Goldman Sachs that the models exclude gender as a feature and that the data were vetted for bias by a third party, many prominent names in tech and politics, including Steve Wosniak, publicly commented on the potentially "mysogonsitic algorithm".

Two months later, a study revealed concerns over an algorithm being leveraged by UnitedHealth Group to optimize the outcome of hospital visits given certain cost constraints. The study found that this algorithm was providing comparable risk scores to white and black patients, despite the black patients being significantly sicker, leading to their receiving disproportionately insufficient care. State regulators in New York called on the country's largest healthcare provider "to either prove that a company-developed algorithm used to prioritize patient care in hospitals isn't racially discriminatory against black patients, or stop using it altogether.” The purveyor of the algorithm, UnitedHealth Group-owned Optum, attempted to explain and contextualize the results, but much of the headline damage was already done.

Unintended consequences appear to arise frequently alongside unsupervised algorithms. During the 2010 "flash crash", major stock averages plunged 9% in a matter of minutes when high-frequency trading algorithms fell into a recursive cycle of panic selling. The following year, an unremarkable copy of Peter Lawrence's book, The Making of a Fly, was found inexplicably listed for nearly $24 million on Amazon. It turned out that two sellers of the book had set their prices to update automatically each day. The first seller pegged their price to 0.9983 times the second seller’s, while the second seller pegged their price to 1.270589 times the first’s. 

Incidents like these as well as prescient concerns for the future have led to a surge in interest in explainable AI (XAI).

The Relevance of Explainability

There are a wide range of stakeholders that stand to benefit from a focus on more interpretable AI infrastructure. From a societal perspective, great emphasis is placed on the safeguarding against bias in order to prevent the proliferation of negative feedback loops and the reinforcement of undesirable conditions. From a regulatory perspective, adherence to current frameworks such as GDPR and CCPA, in addition to those that arise in the future, is thought to be aided by explainability features. Finally, from the user perspective, providing an understanding of why AI models make certain decisions is also likely to increase their confidence in products built on those models. 

Avoiding Spurious Correlations

There are also benefits relating to the model’s performance and robustness that should be of interest to data scientists. Explainability features can be leveraged not only in model validation but also with debugging. Furthermore, they can assist practitioners in avoiding conclusions drawn from spurious correlations.

Source: “Why Should I Trust You?” Explaining the Predictions of Any Classifier

For example, it was shown that a trained logistic regression classifier, built to distinguish between images of wolves and huskies, could do so with ostensible accuracy, despite basing that classification on features that are conceptually divorced from the use case. In particular, because most of the images of wolves contained snowy backgrounds, the classifier assumed that was a principal feature causing it to malfunction in the case shown above. 

Because human practitioners usually have prior knowledge pertaining to the important features within their datasets, they can assist in gauging and establishing the trustworthiness of AI models. For example, a doctor working with a model that predicts whether a patient has the flu would be able to look at the relative contributions of various symptoms and see whether the diagnosis adheres to conventional medical wisdom.

Source: “Why Should I Trust You?” Explaining the Predictions of Any Classifier

In one example, shown above, competing classifiers attempted to determine whether a particular text document contained subject matter that pertained to “Christianity” or “Atheism”. Explainability features allowed for an intuitive visualization of the factors that led to respective predictions, revealing important distinctions in performance that would otherwise not have been obvious.

Challenges to Explainability

Despite the numerous benefits to developing XAI, many formidable challenges persist.

A significant hurdle, particularly for those attempting to establish standards and regulations, is the fact that different users will require different levels of explainability in different contexts. Models that are deployed to effectuate decisions that directly impact human life, such as those in hospitals or military environments, will produce different needs and constraints than ones utilized in low-risk situations

There are also nuances within the performance-explainability trade-off. Infrastructure and systems designers are constantly balancing the demands of competing interests. 

Explainability can exist not only in tension with predictive accuracy, but also with user privacy. For example, a model used to assess the creditworthiness of mortgage applicants is likely to utilize data points that those applicants consider private. Functionality that offers insight into a particular input-output pairing could result in deanonymization and begin to erode protections that best practices surrounding personally identifiable information (PII) are structured to enforce.

Risks of Explainability

There are also a number of risks associated with explainable AI. Systems that produce seemingly-credible but actually-incorrect results would be difficult to detect for most consumers. Trust in AI systems can enable deception by way of those very AI systems, especially when stakeholders provide features that purport to offer explainability where they actually do not. Engineers also worry that explainability could give rise to vaster opportunities for exploitation by malicious actors. Simply put, if it is easier to understand how a model converts input into output, it is likely also easier to craft adversarial inputs that are designed to achieve specific outputs.

Current Landscape

DARPA has been one of the earliest and most prominent voices on the topic of XAI, and published the following graphic depicting their view of the paradigm shift:

Source: DARPA

The hope is that this pursuit will result in a shift toward more user- and society-friendly AI deployments without compromising efficacy.

Source: DARPA

There are a litany of approaches that have been proposed for the purpose of tackling Explainable AI, a topic which has gained substantial attention in both academic and professional circles. Generally, they can be grouped into two categories: 1) developing inherently interpretable models; 2) creating tools to understand how black boxes work.

Develop Inherently Interpretable Models

Many AI leaders have argued that it will be prudent to develop models with embedded explainability features, even if that causes a drop in performance. However, recent research has shown that it is possible to pursue explainability without compromising on predictive capabilities.

Decision trees and regression models typically offer great explainability, but trail competing architectures in performance. Deep Neural Networks (DNNs), on the other hand, are powerful predictors but lack interpretability. Combining approaches, it turns out, can offer the best of both worlds.

For example, a technique known as Deep k-Nearest Neighbors (DkNN) combines two traditional architectures, embedding inference techniques to validate predictions into the structure of the classifier. The resulting hybrid model is not only interpretable but also robust against adversarial input.  

Source: Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Creating Tools to Understand How Black Boxes Work

Data visualization techniques often reveal insights that summary statistics would not have. The “Datasaurus Dozen” is a group of 13 datasets which appear completely different from one another despite having identical means, standard deviations, and Pearson’s correlations. Effective visualization techniques allow practitioners to recognize patterns, such as multicollinearity, that can easily degrade model performance in an undetected fashion.

Source: Datasaurus Dozen

Recent research has also shown value in visualizing the interactions between neurons in Artificial Neural Networks. A recent collaboration between OpenAI and Google researchers led to the 2019 introduction of “Activation Atlases”, which represent a new technique for pursuing exactly such an endeavor.

According to the release, activation atlases can help humans, “discover unanticipated issues in neural networks — for example, places where the network is relying on spurious correlations to classify images, or where re-using a feature between two classes leads to strange bugs.”

For example, the technique was deployed on an image classifier that was designed to differentiate frying pans from woks and offer a human-interpretable visualization. 

Source: OpenAI - Introducing Activation Atlases

A quick look at the image above shows that this particular classifier considers the presence of noodles to be an essential attribute of a wok but not a frying pan. This would be a very useful piece of information in quickly understanding why an image of a frying pan filled with spaghetti was being classified as a wok.

Post-Hoc Model Analysis

Post-hoc model analysis is one of the most common paths to explaining AI in production today.

One popular approach is the Local Interpretable Model-Agnostic Explanation (LIME). When LIME receives input, such as an image to classify, it first generates an entirely new dataset composed of permuted samples and then populates the corresponding predictions that a black-box architecture would have produced, had those samples been the input. An inherently interpretable model (e.g. linear or logistic regression, decision trees, or k-Nearest Neighbors) is then trained on the new dataset, which is weighted by the proximity of each respective sample to the input of interest.

Source: Local Interpretable Model-Agnostic Explanations (LIME): An Introduction

In the above image, LIME can be seen identifying the head of a frog as the most important identifying feature in the classification, which we can then verify against our own intuitions.

Shapley Values represent an alternative importance score framework that takes a game theoretical approach to post-hoc feature analysis in attempting to explain the degree to which a particular prediction deviates from the average. As this documentation shows, the framework essentially takes a prediction model and establishes a “game” wherein each feature’s value is assumed to be a “player” competing for a “payout” that is defined by the prediction. Using iterated random sampling, along with information about our model and data, we can quickly assess each feature’s contribution toward pushing the prediction away from its expected value. 


The markets have begun to awaken to the importance of developing explainable AI capabilities. Many adoption trends in the AI space have been driven by enterprise offerings such as Microsoft Azure and Google Cloud Platform. IBM published the graphic below along with their announcement of AI Explainability 360, “a comprehensive open source toolkit of state-of-the-art algorithms that support the interpretability and explainability of machine learning models.”  

Source: IBM

As companies continue to find themselves amidst controversy arising from unexpected and unexplainable AI results, the need will continue to grow for adequate technical solutions that balance all of the competing interests involved in high-impact projects. As such, research and development efforts occurring in both the private and public sector today will inexorably change the AI business landscape of tomorrow.

About the Author

Lloyd Danzig is the Chairman & Founder of the International Consortium for the Ethical Development of Artificial Intelligence, a 501(c)(3) non-profit NGO dedicated to ensuring that rapid developments in AI are made with a keen eye toward the long-term interests of humanity. He is also Founder & CEO of Sharp Alpha Advisors, a sports gaming advisory firm with a focus on companies deploying cutting edge tech. Danzig is the Co-Host of The AI Experience, a podcast providing an accessible analysis of relevant AI news and topics. He also serves as Co-Chairman of CompTIA AI Advisory Council, a committee of preeminent thought leaders focused on establishing industry best practices that benefit businesses while protecting consumers.


Rate this Article