Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News MIT Researchers Open-Source AutoML Visualization Tool ATMSeer

MIT Researchers Open-Source AutoML Visualization Tool ATMSeer

Leia em Português

This item in japanese

A research team from MIT, Hong Kong University, and Zhejiang University has open-sourced ATMSeer, a tool for visualizing and controlling automated machine-learning processes.

Solving a problem with machine learning (ML) requires more than just a dataset and training. For any given ML tasks, there are a variety of algorithms that could be used, and for each algorithm there can be many hyperparameters that can be tweaked. Because different values of hyperparameters will produce models with different accuracies, ML practitioners usually try out several sets of hyperparameter values on a given dataset to try to find hyperparameters that produce the best model. This can be time-consuming, as a separate training job and model evaluation process must be conducted for each set. Of course, they can be run in parallel, but the jobs must be setup and triggered, and the results recorded. Furthermore, choosing the particular values for hyperparameters can involve a bit of guesswork, especially for ones that can take on any numeric value: if 2.5 and 2.6 produce good results, maybe 2.55 would be even better? What about 2.56 or 2.54?

Enter automated machine learning, or AutoML. These are techniques and tools for automating the selection and evaluation of hyperparameters (as well as other common ML tasks such as data cleanup and feature engineering). Both Google Cloud Platform and Microsoft Azure provide commercial AutoML solutions, and there are several open-source packages such as auto-sklearn and Auto-Keras.

MIT's Human Data Interaction Project (HDI) recently open-sourced an AutoML library called Auto Tune Models (ATM). ATM automates the choice of algorithm and hyperparameters; this allows practitioners to focus on the upstream tasks of data cleanup and feature engineering. ATM has a Python interface for running a search for the best model and getting back a description of the results.

In a recent paper, researchers from MIT, Hong Kong University, and Zhejiang University described ATMSeer, a graphical UI that runs on top of ATM and provides visualization of the search process and results. Furthermore, users can control and guide the model search process, in real time. The goal is to "increase the transparency of AutoML." In particular, the authors hope to increase users' confidence that the AutoML process fully explored the space of hyperparameters and did not overlook models that might have performed better.

ATM and ATMSeer do have some limitations compared to commercial solutions such as Azure. For one thing, ATM and ATMSeer only support classification models, whereas Azure supports classification, regression, and time-series forecasting. Azure can also perform feature engineering and data cleanup tasks such as normalization and missing-value imputation.

Athough MIT touts the work as "cracking open the black box of automated machine learning," a commenter on Hacker News noted:

[T]his just allows a bit of insight into the black box of AutoML. It won't provide much insight into the ML black boxes that AutoML searches over. What is really needed are effective ML algorithms that are more transparent and predictable to a wider class of practitioners.

Both ATM and ATMSeer are available on GitHub.

Rate this Article