AI Researchers' Open-Source Model Explanation Toolkit AllenNLP Interpret

Researchers from the Allen Institute for AI and University of California, Irvine, have released AllenNLP Interpret, a toolkit for explaining the results from natural-language processing (NLP) models. The extensible toolkit includes several built-in methods for interpretation and visualization components, as well as examples using AllenNLP Interpret to explain the results of state-of-the art NLP models including BERT and RoBERTa.

In a paper published on arXiv, the research team described the toolkit in more detail. AllenNLP Interpret uses two gradient-based interpretation methods: saliency maps, which determine how much each word or "token" in the input sentence contributes to the model's prediction, and adversarial attacks, which try to remove or change words in the input while still maintaining the same prediction from the model. These techniques are implemented for a variety of NLP tasks and model architectures. The implementations use a generic set of APIs and visualization components, providing a framework for future development of additional techniques and model support.

As companies increase their use of AI to automatically provide answers to questions, users want to know why the AI produced a given answer; for example, in the case of credit-card transaction fraud detection, what in particular about the transaction signaled fraud? The explanation of how the model produced its answer is also important for model developers to understand how well their systems will be able to generalize when confronted with new data; AllenNLP Interpret researcher Sameer Singh often cites the model that appeared to distinguish between wolves and dogs, but really had just learned to detect snow.

For some machine-learning algorithms, the explanation is straightforward: a decision tree, for example, is simply a series of if/then rules. But the outputs of deep-learning models can be more difficult to explain. Singh's previous work includes LIME, which uses linear approximations to explain the predictions of a more complex model. AllenNLP Interpret uses gradient-based methods, which measure the effect of input features on the output. Because computing this gradient is a key component of training in deep learning, these methods can be applied to any deep-learning model.

Although the techniques are generic, AllenNLP Interpret is intended for use in NLP. Inputs to NLP systems are strings of text, usually sentences or whole documents, and the text is parsed into its constituent words or tokens. AllenNLP Interpret includes saliency maps that show each token's contribution to the model prediction; a use case for this might be explaining which words in a sentence caused its sentiment to be classified as positive or negative. The toolkit also includes two adversarial methods that show how changing the tokens in the input could affect the output. The first, HotFlip, replaces the input word that has the highest gradient with other words until the model output changes. The other attack, input reduction, iteratively removes the word with the smallest gradient without changing the output; this results in input texts that are "usually nonsensical but cause high confidence predictions."

The toolkit currently provides explanation code for several models for various NLP tasks, including:

Reading comprehension models NAQANet and BiDAF
Masked language modeling transformers BERT and RoBERTa
Text classification with BiLSTM and self-attention classifiers
Named entity recognition (NER) and coreference resolution

AllenNLP Interpret is implemented using the PyTorch deep-learning framework. The source code is available on GitHub, and the Allen Institute website contains several interactive visualization demos and tutorials.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter