University of Washington Open-Sources AI Fine-Tuning Algorithm WISE-FT

A team of researchers from University of Washington (UW), Google Brain, and Columbia University have open-sourced weight-space ensembles for fine-tuning (WiSE-FT), an algorithm for fine-tuning AI models that improves robustness under distribution shift. Experiments on several computer vision (CV) benchmarks show that WISE-FT improves accuracy up to 6 percentage points.

The algorithm and several experiments were described in a paper accepted at the upcoming Conference on Computer Vision and Pattern Recognition (CVPR). WiSE-FT is an algorithm for combining the weights of a fine-tuned model with the original model's weights. The resulting ensemble model shows better accuracy under distribution shift---that is, when the patterns of input data differ from the training data---while still maintaining high accuracy on in-distribution data. In a set of experiments using shifted versions of the ImageNet benchmark dataset, a CLIP-based image classifier fine-tuned using WiSE-FT outperformed other strong models. According to the researchers,

We view WiSE-FT as a first step towards more sophisticated fine-tuning schemes and anticipate that future work will continue to leverage the robustness of zero-shot models for building more reliable neural networks.

Because training deep learning models from scratch requires large datasets and considerable compute resources, many developers have begun to use pre-trained models such as CLIP or GPT-3 as a starting point. While these models can be used in a zero-shot/few-shot setting, which requires no updates to the model weights, often they are fine-tuned by doing additional training updates to the model weights using a task-specific dataset. However, this can sometimes result in a final model that may perform quite well on in-distribution data, while performing poorly on out-of-distribution data---data whose statistics do not match that of the training data.

Because this distribution shift does occur quite frequently in a production setting, the UW team investigated ways to improve the robustness of fine-tuned models. The resulting algorithm, which can be implemented "in a few lines of PyTorch," is a linear interpolation of the weights of the original model and the fine-tuned one. A mixing coefficient can be used to give one of the two a stronger influence in the final result, but the researchers determined in a wide range of experiments that a neutral mixture "yields close to optimal performance." In addition to the robustness benefits, WiSE-FT requires no additional computation during the fine-tuning process or during inference.

To test the algorithm, the team built an image classifier model based on CLIP, with a final linear layer added to produce the output. The model was fine-tuned using the ImageNet dataset, then evaluated on five different distribution-shifted datasets derived from ImageNet: ImageNet-V2, ImageNet-R, ImageNet Sketch, ObjectNet, and ImageNet-A. Using WiSE-FT, the resulting model outperformed previous fine-tuned CLIP classifiers on both the reference ImageNet test data as well as the shifted datasets.

Co-author Gabriel Ilharco, a PhD student at UW, answered several questions about the work on Twitter. One commenter asked about using ensembles of several fine-tuned models, instead of including the original model. Ilharco replied,

We...find that you can substantially improve the robustness of standard models if you ensemble them (in output-space) with a robust model. If you ensemble two non-robust models, you get no gains in effective robustness.

The code for WiSE-FT and the paper's experiments are available on GitHub.

About the Author

Anthony Alford

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Anthony Alford

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter