BigScience Research Workshop Releases AI Language Model T0

BigScience Research Workshop released T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. T0 can often outperform models 6x larger on the BIG-bench benchmark, and can outperform the 16x larger GPT-3 on several other NLP benchmarks.

The Workshop team described the model and its training datasets in a paper published on arXiv. To investigate the zero-shot performance of large NLP models on completely "unseen" tasks, the researchers converted a large set of supervised-learning NLP datasets into a templated prompt format. The goal of the research was to determine if training data in this format improved T0's ability to generalize to unseen tasks. When evaluated on 11 held-out datasets, T0 outperformed GPT-3 on 8 of the datasets. T0 also outperformed other baseline models in 13 of the 14 tasks in the BIG-bench benchmark.

Large language models are often able to perform reasonably well on unseen tasks---that is, tasks that they have not been trained to perform. For example, although GPT-3 was only explicitly trained to fill in words that have been masked out of sentences, the model actually performed well at a variety of other tasks, including translation, question answering, and even 3-digit arithmetic. According to the BigScience team, one hypothesis to explain this is that the models encounter a "mixture of implicit tasks" in the training data. On the other hand, they point out that the training data is often scraped from the web and could contain such tasks explicitly; for example, web pages with trivia questions and answers effectively constitute a training dataset for a question-answering task.

BigScience Research Workshop is a year-long collaboration of "600 researchers from 50 countries and more than 250 institutions," with the goal of creating and investigating a very large multilingual dataset and deep-learning NLP model. The team chose to build T0 to "focus on intentionally and explicitly training large language models in a supervised and massively multitask fashion." The key feature of the training data was to specify the language tasks using natural language prompts; the researchers hypothesized that this format of training data would result in a model that could better generalize to unseen tasks while requiring fewer model parameters.

To create their datasets, the team collected several existing supervised-learning datasets for a variety of NLP tasks. The datasets were then converted to prompted form using a set of templates; for example, a template for a natural language inference task might be "Suppose X. Can we infer Y?" where X and Y are phrases such as "the banker contacted the professors and the athlete" and "the banker contacted the professors." The expected output of the model for such an input would be a classification of either true or false. Ultimately the researchers collected 62 datasets organized into 12 tasks.

The T0 model is based on Google's Text-To-Text Transfer Transformer (T5) pre-trained model, which is then fine-tuned on a mixture of the prompt-form multitask dataset. Four tasks' datasets were held out completely to evaluate the model's zero-shot generalization performance. The model, which contained 11B parameters, outperformed a 175B-parameter GPT-3 model on 8 of the 11 datasets.

Several of the T0 research team joined a Hacker News discussion about the work. One researcher pointed out that both Google and EleutherAI had recently investigated "instruction tuning" language models to improve their generalization ability. When asked about whether the model size made inference a "hassle," another researcher replied:

Regarding whether the size is a hassle: It's possible to run inference on a single Google Cloud TPU v3-8 device or on a server with 4x 32GB v100 GPUs. Hugging Face also has an inference API...

The T0 model files are available on the HuggingFace site.

Topics

Pitfalls of Unified Memory Models in GPUs

Evolving Trainline Architecture for Scale, Reliability and Productivity

Generally AI - Season 2 - Episode 3: Surviving the AI Winter

Mastering Observability: Unlocking Customer Insights with Gojko Adzic

Proactive Approaches to Securing Linux Systems and Engineering Applications

Helpful links

Choose your language

Write for InfoQ

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Microsoft Introduces Drasi: Open-Source System for Real-Time Event Processing and Automation

How Cell-Based Architecture Enhances Modern Distributed Systems

Article Series: Cell-Based Architectures: How to Build Scalable and Resilient Systems

Orchestrating a Path to Success - a Conversation with Bernd Ruecker

OpenAI Releases Swarm, an Experimental Open-Source Framework for Multi-Agent Orchestration

Generally AI - Season 2 - Episode 3: Surviving the AI Winter

Challenges and Lessons Porting Code from C to Rust

Copilot Now Available in OneDrive: AI-Powered Features for Streamlined Document Management

Ephemeral IDs: Cloudflare's Latest Tool for Fraud Detection

Evolving Trainline Architecture for Scale, Reliability and Productivity

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

No EC2 or Kubernetes Allowed: Insights from Building Serverless-Only Architecture at PostNL

Mastering Observability: Unlocking Customer Insights with Gojko Adzic

How a Sustainable Mindset in Software Engineering Can Increase Team Performance and Prevent Burnout

The Ongoing Challenges of DevSecOps Transformation and Improving Developer Experience

University Researchers Publish Analysis of Chain-of-Thought Reasoning in LLMs

Microsoft and Tsinghua University Present DIFF Transformer for LLMs

OpenAI Releases Swarm, an Experimental Open-Source Framework for Multi-Agent Orchestration

Google Cloud Adds Scalable Vector Search to Memorystore for Valkey & Redis Cluster

Podman Desktop 1.13 Launches with Hyper-V Support and Additional Enhancements

Uber Completes Major MySQL Fleet Upgrade, Boosting Performance and Security

QCon San Francisco

QCon London

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?