Key Takeaways
- Intelligent software agents must use common sense in order to reason.
- Common-sense knowledge is required before intelligent software agents can anticipate how people and the physical world react.
- Deep learning models do not currently understand what they produce, and have no common-sense knowledge.
- The Commonsense Transformers (COMET) project attempts to train models with information about the world in ways similar to how a human would acquire such knowledge.
- The COMET project and other similar efforts are still in the research phase.
Artificial intelligence researchers have not been successful in giving intelligent agents the common-sense knowledge they need to reason about the world. Without this knowledge, it is impossible for intelligent agents to truly interact with the world. Traditionally, there have been two unsuccessful approaches to getting computers to reason about the world—symbolic logic and deep learning. A new project, called COMET, tries to bring these two approaches together. Although it has not yet succeeded, it offers the possibility of progress.
What is Common Sense?
Ask yourself, how would an automated vehicle know that a snowman standing at the edge of the street is not going to run into the road? Humans use their common-sense knowledge to realize that is not going to happen.
Why is it so difficult for us to give intelligent agents common-sense knowledge? As illustrated in the previous example, we use this knowledge intuitively, without thinking about it. Often, we do not even realize we are doing it.
From the very outset of artificial intelligence, it was acknowledged that this problem had to be solved. One of first papers written in the new field of computer science was on the topic of programs with common sense.
Common sense is all the background knowledge we have about the physical and social world that we have absorbed over our lives. It includes such things as our understanding of physics, (causality, hot and cold), as well as our expectations about how humans behave. Leora Morgenstern compares common sense to “What you learn when you’re two or four years old, you don’t really ever put down in a book.”
For example, someone can figure out how to drive on the left hand side of the road in England even if they have only driven in countries that drive on the right hand side. They can infer what is the same and what is different. Leora Morgenstern is currently maintaining a collection of problems that require common sense to solve.
Symbolic Reasoning
The first attempt to do this was to program into a computer rules for common sense. Today this attempt is referred to as Good Old Fashioned Artificial Intelligence—GOFAI. Although this led to some success with rules-based expert systems, in general, this approach has not succeeded in providing agents with common sense. “The amount of knowledge that can be conveniently represented in the formalisms of logic is kind of limited in principle,” said Michael Witbrock, AI researcher at the University of Auckland in New Zealand. “It turned out to be a truly overwhelming task.”
Another attempt was Cyc. Starting in 1984, it was originally a project to capture common-sense knowledge through a knowledge base and relationships among that knowledge. Today, it seems to be limited to providing limited private sector applications. Rodney A. Brooks said about Cyc, “While it has been a heroic effort, it has not led to an AI system being able to master even a simple understanding of the world.”
The basic problem is that language is fuzzy. First of all, there must be millions of rules, and there are exceptions. Ellie Pavlick gives the following example. If I go outside in the rain, I will get wet, except if I am underneath something. Even that statement is insufficient because it depends on the angle of the rain, how hard it is raining, and the width of the thing I am underneath.
Besides the number of rules and exceptions, the very symbols used are ambiguous. For example, the word bass can mean a type of fish, a low frequency tone, a type of instrument, or names of people and places.
Semantic Networks
Semantic Networks try to tackle the fuzziness problem. Concept Net, an example of such a network, has used crowd-sourced knowledge where people can enter what they consider to be common-sense knowledge. Here is an example of such a Concept Net network revolving around the word “cake.”
The problem is that information needed to decipher the semantic network is not in the network. For example, the relationship between eating and swallowing always holds. Some do not always hold. A cake can be a snack as well as a dessert, or it may or may not satisfy hunger. You might eat the cake because you want something sweet. It is not likely, although theoretically possible, for a person to eat a cake in the location of an oven, especially if it is hot. Cook seems to be used both as a noun and a verb.
Deep Learning
Neural networks have achieved more success than either of these approaches. Nonetheless, it does not seem that they have been able to achieve common-sense reasoning.
Alpha Go
Alpha Go combines a state-of-the-art tree search with two deep neural networks, each of which has millions of connections. The policy network predicts the next move and is used to narrow the search so that only the moves most likely to lead to a win are considered. The value network reduces the depth of the search tree by estimating the winner in each position instead of searching to the end of the game.
Alpha Go is much closer to human reasoning because it uses Monte-Carlo tree search to simulate the remainder of the game much as a human would play the remainder of the game in their imagination. Since the policy network suggests intelligent possible moves, and the value network evaluates the current position, Alpha Go can choose the move based on the most successful simulation. This is different from the Deep Blue chess algorithm that used massively parallel hardware to do a brute force search.
Nonetheless, Alpha Go and all similar approaches do not require common-sense reasoning because there are no ambiguities to the game, and success is well defined. They are incapable of dealing with unforeseen events, such as the self-driving Uber car that killed a pedestrian because it did not understand that a pedestrian can jaywalk.
Generative Pre-Trained Transformer
Analyzing language using deep learning is an attempt to deal with this ambiguity. These models are pre-trained and use a statistical model of language expressed in millions or billions of parameters in a neural network. If they are fine-tuned for a specific task, such as answering questions or paraphrasing text, they can give the impression that they appear to understand what they are reading.
Generative Pre-Trained Transformer (GPT)-3 is the largest trained language model in existence today. The basic model generates text responses to input text. You might ask it to answer a question or write an essay. It must be trained with examples before you can get it to work in a given context.
Bidirectional Encoder Representations from Transformers
Bidirectional Encoder Representations from Transformers (BERT) is a neural network that tries to understand written language. BERT is a natural language processing (NLP) algorithm that uses a neural net to create pre-trained models. Pre-trained models are general purposed models that can be refined for specific NLP tasks. Unlike other algorithms, BERT is bidirectional. The context the algorithm uses is based on both the words in the sentence before and after the word in question. For example, in the sentence, “I sat by the bank of the river Thames,” the software uses both fragments “I sat by the,” and “of the river Thames” to determine the meaning of the word “bank.” A unidirectional algorithm would have to guess whether the subject of the sentence was sitting before a financial institution, or a body of water based on only the first part of the sentence. Google claims that based on its ability to pass tests such as the Stanford question answering data set, it can provide state-of-the-art results on NLP tasks.
Sam Bowman explains that BERT is not a fully trained neural network, but an open-source recipe for fine-tuning neural networks to perform many natural language processing tasks.
Problems with Deep Learning Approaches
The ultimate question is do they understand what they read or write, or are they just sophisticated computational versions of Clever Hans? Or to put it another way, would you trust any program that was trained to pass a professional licensing exam to be an engineer, lawyer, or doctor?
A new set of benchmark tests, called SuperGLUE, has been created by Bowman and several collaborators to see how much improvement in language understanding has been accomplished by machines. So far, no machine has surpassed human performance on the benchmarks. Nonetheless, the benchmarks do not indicate whether any understanding took place.
How Do You Put Common Sense into the Model?
Putting common sense into the model is the goal of COMET (Commonsense Transformers). The project is an attempt to combine the approaches of symbolic reasoning with the neural network language model.
The key idea is to introduce common-sense knowledge when fine tuning a model. Similar to the deep learning models, they try to generate plausible responses rather than making deductions from an encyclopedic knowledge base.
When Yejin Choi started working at the Allen Institute in 2019, she thought that neural networks could make progress where the symbolic approach had failed. The idea was to give the language model additional training from a common-sense knowledge base. The language model could then generate inferences based on common sense just like a generative network could learn how to generate text.
Choi and her colleagues fine-tuned a neural language model with the common-sense knowledge from a knowledge base called Atomic in order to create COMET. Anybody can use COMET. Leora Morgenstern thinks COMET can move the field forward by connecting deep learning and common sense.
Will the COMET Approach Work?
COMET relies on surface patterns in its training data rather than understanding concepts. The key idea would be to supply surface patterns with more information outside of language such as visual perceptions or embodied sensations. First person representations, not language, would be the basis for common sense.
Ellie Pavlick is attempting to teach intelligent agents common sense by having them interact with virtual reality. Pavlick notes that common sense would still exist even without the ability to talk to other people. Presumably, humans were using common sense to understand the world before they were communicating.
The idea is to teach intelligent agents to interact with the world the way a child does. Instead of associating the idea of eating with a textual description, an intelligent agent would be told, “We are now going to eat,” and then it would see the associated actions such as, gathering food from the refrigerator, preparing the meal, and then see its consumption. Concept and action would be associated with each other. It could then generate similar words when seeing similar actions.
Nazneen Rajani is investigating whether language models can reason using basic physics. For example, if a ball is inside a jar, and the jar is tipped over, the ball will fall out.
Choi and her colleagues are trying to augment COMET with labeled pictures. The idea is to generate common-sense inferences about what could happen before and after an event, as well as what people’s present intents are.
Choi’s hope is to have a neural network that could learn from knowledge bases without human supervision. COMET may not be ultimately successful, but it is an example of an approach that could eventually work.
About the Author
Michael Stiefel, principal of Reliable Software, Inc. is a consultant on software architecture and development and the alignment of information technology with business goals. As a member of an OASIS Technical Committee, he helped develop a core SOA Reference Model and related Reference Architectures. He was a Lecturer in the Aeronautics and Astronautics Department at the Massachusetts Institute of Technology where his research and teaching focus was understanding how people build mental models in order to solve problems. As Adjunct faculty, Stiefel has taught graduate and undergraduate software engineering courses at Northeastern University and Framingham State University. He explores his interest in the intersection between technology and art in the blog Art and Software.