Baidu's ERNIE 3.0 AI Model Exceeds Human Performance on Language Understanding Benchmark

A research team from Baidu published a paper on the 3.0 version of Enhanced Language RepresentatioN with Informative Entities (ERNIE), a natural language processing (NLP) deep-learning model. The model contains 10B parameters and achieved a new state-of-the-art score on the SuperGLUE benchmark, outperforming the human baseline score.

The model and several experiments were described in a post on Baidu's blog. Unlike most other deep-learning NLP models that are trained only on unstructured text, ERNIE's training data includes structured knowledge graph data, which helps the model output more coherent responses. The model consists of a Transformer-XL "backbone" to encode the input to a latent representation, along with two separate decoder networks: one for natural language understanding (NLU) and another for natural language generation (NLG). In addition to setting a new top score on SuperGLUE, displacing Microsoft and Google, ERNIE also set new state-of-the-art scores on 54 Chinese-language NLP tasks.

Although large deep-learning models trained only on text, such as OpenAI's GPT-3 or Google's T5, perform well on a wide variety of problems, researchers have found these models often struggle with some NLU tasks that require world knowledge not present in the input text. To address this, in early 2019 researchers at Tsinghua University open-sourced the first version of ERNIE, a model combining text and knowledge graph data; later that year, Baidu released the 2.0 version, which was the first model to score higher than 90 on the GLUE benchmark.

Like GPT-3 and other models, ERNIE 3.0 is pre-trained on text using several unsupervised-learning tasks, including masking and language modeling. To incorporate knowledge graph data into the training process, the Baidu team created a new pre-training task called universal knowledge-text prediction (UKTP). In this task, the model is given a sentence from an encyclopedia along with a knowledge graph representation of the sentence, with part of the data randomly masked; the model must then predict the correct value for the masked data. Overall, the training dataset was 4TB, the largest Chinese text corpus to date, according to Baidu.

The researchers evaluated ERNIE's performance on several downstream tasks. For NLU, the team fine-tuned the model on 45 different datasets for 14 tasks, including sentiment analysis, news classification, named-entity recognition, and document retrieval; for NLG, 9 datasets and 7 tasks, including text summarization, closed-book question answering, machine translation, and dialogue generation. On all tasks, ERNIE set new state-of-the-art performance scores. To measure zero-shot NLG performance, human annotators were asked to score the output from ERNIE and three other models. According to these results, ERNIE generated "the most coherent, fluent and accurate texts on average."

Neural-symbolic computing, the combination of deep-learning neural network models with "good old-fashioned AI" techniques, is an active research area. In 2020, a team from Tsinghua worked with researchers in Canada to produce KEPLER, which was trained on the text content of Wikipedia combined with the structured Wikidata knowledge base. More recently, a team at MIT combined a GPT-3 deep-learning model with a symbolic world state model to improve the coherence of GPT-3's text generation, and researchers from Berkeley have combined a neural question-answering system with a "classic AI" crossword-puzzle solver called Dr. Fill.

Although Baidu has not released the code and models for ERNIE 3.0, version 2.0 is available on GitHub. There is also an interactive demo of ERNIE 3.0 on Baidu's website.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter