Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Microsoft Open-Sources TensorWatch AI Debugging Tool

Microsoft Open-Sources TensorWatch AI Debugging Tool

Microsoft Research open-sourced TensorWatch, their debugging tool for AI and deep-learning. TensorWatch supports PyTorch as well as TensorFlow eager tensors, and allows developers to interactively debug training jobs in real-time via Jupyter notebooks, or build their own custom UIs in Python.

In a recent blog post, a research team led by Shital Shah announced the open-source release of TensorWatch. TensorWatch is a Python library for visualizing data from all phases of the deep-learning model-development cycle: from model structure, to training metrics, to explanation of model predictions. TensorWatch is designed to be used as an interactive tool in Jupyter notebooks or JupyterLab dashboards, but as a Python library it can also be included in custom tools and UIs. According to the development team:

"We like to think of TensorWatch as the Swiss Army knife of debugging tools with many advanced capabilities researchers and engineers will find helpful in their work."

The key concept for using TensorWatch during model training is the stream, which is a sequence of events that contain data values observed at a point in time. Deep-learning training proceeds in batches (subset of the training data) and epochs (a training run through the entire dataset). After each batch and epoch, the training framework outputs several metrics showing training progress, such as a model's accuracy on the test dataset. To use TensorWatch, the training code is modified (or instrumented) to send these metrics to a stream. TensorWatch supports a "lazy-logging" mode that has a low-overhead if the stream data is not used; the idea is to observe every possible metric that might be useful in debugging.

The TensorWatch visualization UI is also designed to work with streams. As new events arrive, the UI updates to include the data from the new event. TensorWatch allows users to transform stream data by creating new streams from existing streams; the new stream applies a user-defined Python lambda function to each incoming event. This ability to observe all training metrics with low-overhead, to apply arbitrary transforms, and view the results in real-time in a Jupyter notebook are the core value proposition of TensorWatch.

The library also contains visualization functions for other stages in the development process; for example, viewing a graph of a neural-network model, visualizing a dataset in a lower-dimensional space, or explaining a model's output. These functions are not actually implemented by TensorWatch; instead, TensorWatch provides a wrapper interface around them.

TensorWatch is designed to work with PyTorch, which currently lacks a native visualization and debugging tool, whereas Google's rival TensorFlow framework ships with a visualization tool called TensorBoard. Consequently, both TensorFlow and Keras, the high-level API wrapper for TensorFlow, have convenience methods for instrumenting the training process for visualization. PyTorch does not have similar convenience methods, and the TensorWatch examples for visualizing training actually rely on a separate Python library (maintained by the TensorWatch team lead Shah) for handling the instrumentation. It is not clear why that code has not been included with TensorWatch.

Although it is intended primarily for PyTorch, the team says it "should also work with TensorFlow eager tensors." In a thread on Reddit, a user asked how TensorWatch compared with TensorBoard, and whether TensorWatch would work with Keras. Shah replied:

It supports Keras but currently it will need bit of work. To make this easy, we are planning to add [a] Keras callback so you can simply use something like callbacks=TensorWatch(....) in .fit()

TensorWatch is available on GitHub.

Rate this Article