Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Galactica: Large Language Model for Scientific Knowledge

Galactica: Large Language Model for Scientific Knowledge

Meta AI and Papers with Code recently released Galactica, a 120-billion-parameter scientific-language model which can search and summarize academic literature, solve math problems, and write scientific code.

Galactica's architecture is based on a transformer, an attention mechanism which draws global dependencies between input and output. Although, this model is a decode-only setup with some changes. Some of the changes compared with the original transformer include using  GeLU as an activation function, learnt position embedding,  a vocabulary using byte pair encoding method and no bias parameter on dense-kernel or layer-norms.

The researchers trained the model using a tokenization process with various modalities ( natural language versus math formulas versus molecular sequences, etc.). They used a special tokenization including things like identifying math operation characters or mark start/end of different types of sequences. The source material for the dataset included 48-million papers, textbooks, reference materials, compounds, proteins and other sources of scientific knowledge. They implemented a special token to identify sections of step-by-step reasoning, which encourages Galactica to apply an internal working memory of sorts, which it would otherwise not be able to do.

There have been multiple large language models (LLM) released in the last year with billions of parameters, not specialized per se in the science domain. Some of the models benchmarked in Galactica paper are OPT, BLOOM , GPT-3, Chinchilla and PaLM. Galactic performs well on reasoning , outperforming Chinchilla on the mathematical MMLU dataset by 41.3% to 35.7% and PaLM-540B on MATH with a score of 20.4% versus 8.8%. Despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on the BIG-bench dataset benchmark. The Gradient offers a detailed look at evaluating natural-language-processing models.

Galactica is available as a Python package or a web interface for providing prompts. The first one can be installed as follows: 

pip install galai

A small script for using the standard mode (6.7-billion parameters) is:

import galai as gal

model = gal.load_model("standard")
model.generate("Scaled dot product attention:\n\n\\[")
# Scaled dot product attention:\n\n\\[ \\displaystyle\\text{Attention}(Q,K,V)=\\text{softmax}(\\frac{QK^{T}}{\\sqrt{d_{k}}}%\n)V \\]

If you want to test model reasoning performance you can use the following script:

model.generate("A force of 0.6N is applied to an object, which accelerates at 3m/s. What is its mass? <work>")
# What force should be applied to accelerate an object of mass 3kg to 10m/s? <work>\nWe can use Newton's second law: F = ma. We can substitute variables to get:\n\n\\[ F = \\left(66kg

Source : Galactica

Like many other LLMs , Galactica has a few limitations. It can trend toward using toxic language, a behavior known as "hallucination". Although, the team responsible mentioned that it is less toxic than other LLMs like OPT. Other limitations are frequency bias and overconfidence, especially about highly specialized scientific content. The first bias means it only recommends highly cited scientific papers. 

On social media, there has been quite a buzz around the topic:

For further insight about Galactica's demo exploration, there is for instance this tweet as well as this by Patrick Mineault, a former AI researcher at Google. There is also a good Twitter thread by Michael Black discussing Galatica's performance. Finally, there is a youtube review by Yannic Kilcher and a discussion about Galactica’s language toxicity here.

About the Author

Rate this Article