OpenAI Announces Question-Answering AI WebGPT

OpenAI has developed WebGPT, an AI model for long-form question-answering based on GPT-3. WebGPT can use web search queries to collect supporting references for its response, and on Reddit questions its answers were preferred by human judges over the highest-voted answer 69% of the time.

The announcement was made on the OpenAI blog. WebGPT is a version of OpenAI's pre-trained GPT-3 natural language processing (NLP) model that has been fine-tuned to use a web browser to perform search engine queries, follow links, and quote sources. The model is trained on a dataset collected from the Explain Like I'm 5 (ELI5) subreddit using a combination of supervised learning and reinforcement learning (RL) incorporating human feedback, and can generate paragraph-length answers to open-ended questions on a wide range of topics. According to OpenAI:

Human feedback and tools such as web browsers offer a promising path towards robustly truthful, general-purpose AI systems. Our current system struggles with challenging or unfamiliar circumstances, but still represents significant progress in this direction.

Although question-answering (QA) has long been a subject of research in AI, most datasets have focused on simple "trivia-type" questions with short answers. In 2019, with the goal of creating smarter digital assistants, a team of researchers from Facebook and Google proposed a long-form question-answering (LFQA) task, which requires an AI to produce richer answers to more complex, open-ended questions. The team also collected a large dataset scraped from the ELI5 subreddit for training and benchmarking LFQA models, consisting of questions (and associated answers) ranging from the mundane (Why are items always priced ending in ".99" instead of ".00"?) to the imponderable (Why do people give Reddit Gold to admins?).

OpenAI's GPT-3 model proved to be quite good when evaluated on QA benchmarks, scoring up to 71.2% on the TriviaQA benchmark with no fine-tuning. However, like many language models, GPT-3 often hallucinates; that is, it generates answers that seem reasonable but are factually incorrect. To address this problem, many researchers have augmented deep-learning QA models with an information-retrieval mechanism which can query a knowledge base to provide additional context to the model's decoder mechanism that generates responses.

OpenAI used a similar approach, but instead of including information retrieval in the model, they trained their model to interact directly with a web search engine: a task "that humans can do well, and that a language model can mimic." The team first developed a web browsing environment which can be controlled via text commands produced by a pre-trained GPT-3 model. The model is then operated as a RL agent: given the environment which consists of a question and the web browser's current page, the agent generates a command, such as issuing a search query, following a link, extracting context from a page, or generating a final result. This agent is fine-tuned using a combination of supervised learning on human-generated examples and RL using a reward model.

The team evaluated WebGPT on both the ELI5 dataset as well as TriviaQA. For the ELI5 evaluation, OpenAI collected the top-voted answer from Reddit and also had human demonstrators generate answers using the same web browsing environment as the model. The researchers hired contractors to compare WebGPT's answers to these human-created answers, with WebGPT's answers being preferred over the Reddit answer 69% of the time and over the demonstrators' answers 56% of the time. On the TriviaQA benchmark, WebGPT outperformed GPT-3, producing answers that were true 75% of the time, and "both true and informative" 54% of the time.

InfoQ has previously covered other efforts to improve AI language model performance using external knowledge bases, including Baidu's ERNIE 3.0, which is trained on a knowledge graph, and Facebook's BlenderBot 2.0 Chatbot, which can use internet searches for supplemental conversational context. More recently, DeepMind developed Retrieval Enhanced TRansfOrmers (RETRO), a method to augment a pre-trained Transformer model by incorporating information retrieval into the model's attention mechanism.

About the Author

Anthony Alford

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

About the Author

Anthony Alford

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter