Google Trains 280 Billion Parameter AI Language Model Gopher

Google subsidiary DeepMind announced Gopher, a 280-billion-parameter AI natural language processing (NLP) model. Based on the Transformer architecture and trained on a 10.5TB corpus called MassiveText, Gopher outperformed the current state-of-the-art on 100 of 124 evaluation tasks.

The model and several experiments were described in a paper published on arXiv. As part of their research effort in general AI, the DeepMind team trained Gopher and several smaller models to explore the strengths and weaknesses of large language models (LLMs). In particular, the researchers identified tasks where increased model scale led to improved accuracy, such as reading comprehension and fact-checking, as well as those where it did not, such as logical and mathematical reasoning. The team evaluated Gopher on a large number of NLP benchmarks, including Massive Multitask Language Understanding (MMLU) and BIG-bench and compared its performance to several baseline models such as GPT-3, noting a general trend that Gopher showed consistent improvement on knowledge-intensive tasks, but less on reasoning-intensive ones. According to the DeepMind team, Gopher is part of

a foundation for DeepMind’s language research going forward, particularly in areas that will have a bearing on how these models are evaluated and deployed...This approach is key to creating large language models that serve society, furthering our mission of solving intelligence to advance science and benefit humanity.

Language models predict the next item or token in a sequence of text, given the previous tokens; when such a model is used iteratively, with the predicted output fed back as the input, the model is termed autoregressive. Autoregressive language models based on the Transformer deep-learning architecture have set state-of-the-art performance records on many NLP tasks, and many researchers have developed very large-scale models. Although the 175B parameter GPT-3 may be the most well-known, models with more parameters have been trained, including the 178B parameter Jurassic-1 and the 530B parameter Megatron-Turing NLG.

Collecting a large dataset for training such models is a challenge. Several such datasets have been open-sourced, such as the Pile and C4, and contain documents scraped from websites such as Wikipedia. The DeepMind team was concerned that simply crawling the web indiscriminately might taint their training dataset with test datasets for their benchmark evaluations, as those are available on the web. To prevent this, DeepMind developed a data-preparation pipeline and a custom training dataset called MassiveText. Starting with the contents of C4, Wikipedia, GitHub, and other sources, the pipeline filters out explicit content, performs document deduplication, and filters out test data.

DeepMind trained six models of varying size, from 44M parameters to the 280B parameter Gopher model. They evaluated the models on a battery of 152 tasks, including 62 from BIG-bench, 57 from MMLU, as well as benchmark tasks for language modeling, reading comprehension, fact checking, question answering, and common sense. For 124 of these tasks, they compared their performance with known state-of-the-art performance, with Gopher beating the record on 100. The team also investigated how their model performed at different scales, concluding that "[m]any academic subjects, along with general knowledge, see large improvements come from scale alone," but scale has a "reduced benefit" for logical reasoning, common sense, and mathematics tasks.

In a Hacker News discussion about Gopher, some commenters wondered if its ability to "dig up" information, inspired its creators to give it the same name as the pre-web Gopher search system. Others discussed whether language models should be considered "true" AI:

The closer we get to artificial intelligence, the more we raise the bar for what qualifies as AI (as we should). Gopher/GPT-3 are already much more accurate than the average human at technical information retrieval.

Gopher's rank on several NLP benchmarks can be found on the Papers with Code website.

About the Author

Anthony Alford

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Anthony Alford

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter