BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News OpenAI Introduces InstructGPT Language Model to Follow Human Instructions

OpenAI Introduces InstructGPT Language Model to Follow Human Instructions

This item in japanese

Bookmarks

OpenAI overhauled the GPT-3 language model and introduced a new default tool called InstructGPT to address complaints about toxic language and misinformation.

GPT-3, like other large language models, was created in part to generate human-like text in a convincing way. To make the models safer, helpful, and aligned to follow instrunctions, OpenAI used reinforcement learning from human feedback (RLHF) to fine-tune GPT-3.

Compared to GPT-3, the new InstructGPT models are better at following instructions in English, less inclined to produce misinformation, and at least slightly less likely to produce toxic results. 

To transform GPT-3 models into InstructGPT models, OpenAI designed a three-step procedure. First is the fine-tuning of the model. Second is building a reward model (RM). Third is to take the Supervised Fine-Tuning (SFT) model and further fine-tune it using reinforcement learning.

One positive aspect is that InstructGPT is better performance-wise than GPT-3, not necessarily in terms of NLP benchmarks, in which GPT-3 often surpasses InstructGPT, but it’s better adapted to human preference, which ultimately is a better predictor of real-world performance. The reason is InstructGPT is more aligned with human intention through a reinforcement learning paradigm that makes it learn from human feedback.

On the other hand, InstructGPT being better than GPT-3 at following instructions has a dark side. A malicious user could take advantage of that to make the model less truthful and helpful, and more harmful. Given that the model is also more powerful than GPT-3, the damage could be greater.

However, InstructGPT isn’t just better than GPT-3 at following instructions, it’s also better aligned with human intention. The AI alignment problem is a well-known problem in the field. It defines the difficulty of designing AI systems that understand our values, beliefs, and desires, and that behave in a way that won't interfere with them.

According to OpenAI, this is the first application of alignment and the results show that these techniques are effective at significantly improving the alignment of general-purpose AI systems with human intentions. The InstructGPT models are now deployed as the default language models on the OpenAI API.

About the Author

Rate this Article

Adoption
Style

BT