BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News DeepMind Open-Sources AI Interpretability Research Tool Tracr

DeepMind Open-Sources AI Interpretability Research Tool Tracr

Researchers at DeepMind have open-sourced TRAnsformer Compiler for RASP (Tracr), a compiler that translates programs into neural network models. Tracr is intended for research in mechanistic interpretability of Transformer AI models such as GPT-3.

Tracr is a compiler for the Restricted Access Sequence Processing (RASP) language, which was developed as a way to reason about how Transformer-based neural networks operate; in particular, to explain why they produce the results that they do. Tracr allows researchers to develop programs in RASP which are then compiled into runnable neural network models. The goal is to provide "ground truth" models for evaluating AI interpretability tools. According to DeepMind:

We outlined our vision for the use of compiled models in interpretability, and there may be other potential applications of Tracr within and beyond interpretability research. We are looking forward to seeing other researchers use it, and we hope studying compiled models will help to increase our understanding of neural networks.

As deep learning models become larger and more complex, it becomes harder to explain how and why they produce a particular output. Research on AI interpretability techniques is ongoing and pursuing multiple avenues. The DeepMind team developed Tracr to help with mechanistic interpretability, which attempts to "reverse engineer" deep learning models. An analogy in programming would be reverse-engineering high-level source code from a binary executable file.

The analogy for Tracr, then, is to provide a way to evaluate a mechanistic interpreter by starting with the high-level source code and compiling it to the binary. If the interpreter can reproduce the original source code, that is evidence that the interpreter results can be trusted.

Tracr Source Code Analogy

Tracr Source Code Analogy. Image Source: https://arxiv.org/abs/2301.05062

The analogy is quite apt, since in fact Tracr is a compiler for a high-level programming language, RASP. RASP is a language for specifying a computational graph of a Transformer model; the RASP primitives map to Transformer components such as embeddings and attention. RASP was developed to allow researchers to "think like a transformer" by abstracting away the details of the computation.

Continuing with the programming analogy, the DeepMind team created an "assembly language" for Transformers called craft. Models are specified in craft at the circuits level; that is, subgraphs of a full neural network.  The Tracr compiler converts the RASP specification of a model to craft, then from craft into a final model with concrete weights.

To demonstrate the use of Tracr, the team implemented several models. While decoder-only Transformer models are usually used for natural language processing (NLP) tasks such as text summarization or question answering, the DeepMind researchers used Tracr to create models simpler tasks: counting the number of tokens in an input sequence, sorting a sequence of numbers, and checking for balanced parenthesis.

Besides its use as a tool for testing interpretability tools, the researchers also point out other potential applications of Tracr. For example, hand-coded implementations of parts of a model could be compiled and used to replace part of a model that was produced by traditional training methods; this could possibly improve overall model performance.

In a Twitter thread discussing the work, one user wondered about the code complexity that Tracr and RASP could handle. DeepMind research engineer Tom Lieberum pointed out:

RASP is not turing complete, since it doesn't have an equivalent to WHILE. But [neither do] transformers. If you want something turing complete you'd need to write a RASP for RNNs or something like that.

The Tracr code and several examples are available on GitHub.

About the Author

Rate this Article

Adoption
Style

BT