Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News BigScience Research Workshop Releases AI Language Model T0

BigScience Research Workshop Releases AI Language Model T0

This item in japanese

BigScience Research Workshop released T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. T0 can often outperform models 6x larger on the BIG-bench benchmark, and can outperform the 16x larger GPT-3 on several other NLP benchmarks.

The Workshop team described the model and its training datasets in a paper published on arXiv. To investigate the zero-shot performance of large NLP models on completely "unseen" tasks, the researchers converted a large set of supervised-learning NLP datasets into a templated prompt format. The goal of the research was to determine if training data in this format improved T0's ability to generalize to unseen tasks. When evaluated on 11 held-out datasets, T0 outperformed GPT-3 on 8 of the datasets. T0 also outperformed other baseline models in 13 of the 14 tasks in the BIG-bench benchmark.

Large language models are often able to perform reasonably well on unseen tasks---that is, tasks that they have not been trained to perform. For example, although GPT-3 was only explicitly trained to fill in words that have been masked out of sentences, the model actually performed well at a variety of other tasks, including translation, question answering, and even 3-digit arithmetic. According to the BigScience team, one hypothesis to explain this is that the models encounter a "mixture of implicit tasks" in the training data. On the other hand, they point out that the training data is often scraped from the web and could contain such tasks explicitly; for example, web pages with trivia questions and answers effectively constitute a training dataset for a question-answering task.

BigScience Research Workshop is a year-long collaboration of "600 researchers from 50 countries and more than 250 institutions," with the goal of creating and investigating a very large multilingual dataset and deep-learning NLP model. The team chose to build T0 to "focus on intentionally and explicitly training large language models in a supervised and massively multitask fashion." The key feature of the training data was to specify the language tasks using natural language prompts; the researchers hypothesized that this format of training data would result in a model that could better generalize to unseen tasks while requiring fewer model parameters.

To create their datasets, the team collected several existing supervised-learning datasets for a variety of NLP tasks. The datasets were then converted to prompted form using a set of templates; for example, a template for a natural language inference task might be "Suppose X. Can we infer Y?" where X and Y are phrases such as "the banker contacted the professors and the athlete" and "the banker contacted the professors." The expected output of the model for such an input would be a classification of either true or false. Ultimately the researchers collected 62 datasets organized into 12 tasks.

The T0 model is based on Google's Text-To-Text Transfer Transformer (T5) pre-trained model, which is then fine-tuned on a mixture of the prompt-form multitask dataset. Four tasks' datasets were held out completely to evaluate the model's zero-shot generalization performance. The model, which contained 11B parameters, outperformed a 175B-parameter GPT-3 model on 8 of the 11 datasets.

Several of the T0 research team joined a Hacker News discussion about the work. One researcher pointed out that both Google and EleutherAI had recently investigated "instruction tuning" language models to improve their generalization ability. When asked about whether the model size made inference a "hassle," another researcher replied:

Regarding whether the size is a hassle: It's possible to run inference on a single Google Cloud TPU v3-8 device or on a server with 4x 32GB v100 GPUs. Hugging Face also has an inference API...

The T0 model files are available on the HuggingFace site.

Rate this Article