Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google AI Launches NLU-Powered Tool to Help Explore COVID-19 Literature

Google AI Launches NLU-Powered Tool to Help Explore COVID-19 Literature

This item in japanese

Google AI launched COVID-19 Research Explorer, which provides a semantic search interface on top of the COVID-19 Open Research Dataset to help scientists and researchers efficiently analyze all of the dataset’s journal articles and preprints.

It has become extremely challenging for the scientific community to keep up with the literature surrounding COVID-19, due to the pace at which scientists around the world are releasing new research. With 50,000 journal articles and preprints already in the COVID-19 Open Research Dataset, researchers need tools to help them quickly analyze this overwhelming amount of text data.

Covid-19 Research Explorer was released too quickly respond to research-driven queries. When users ask a question, the tool returns a set of intelligently ordered papers with snippets highlighting potential answers to the question. The user can ask additional follow-up questions on the set returned by the previous question.

The semantic search aspect of the tool is powered by Google’s BERT language model, which also plays a role in Google’s main search engine. Because neural semantic search models require a large amount of training data, Google first built a large synthetic corpus of questions and relevant documents from the biomedical domain. Google's engineers trained the model to translate answers from a passage of text to questions about that passage using a neural architecture called encoder-decoder, which is commonly used for tasks like machine translation. After finding that the neural model did not perform as well as a keyword-based model, they built a hybrid term-neural retrieval model. The two methods are easily combined because both types of models can be cast as vector space models. When concatenated, term-based vectors and neural-based vectors can represent documents in a corpus.

The tool is freely available on a short-term basis, with usability enhancements coming over the next few months. Google is actively looking for feedback from researchers using the tool.

Rate this Article