Symbolic AI algorithms have played an important role in AI’s history, but they face challenges in learning on their own. After IBM Watson used symbolic reasoning to beat Brad Rutter and Ken Jennings at Jeopardy in 2011, the technology has been eclipsed by neural networks trained by deep learning.
The power of neural networks is that they help automate the process of generating models of the world. This has led to several significant milestones in artificial intelligence, giving rise to deep learning models that, for example, could beat humans in progressively complex games, including Go and StarCraft. But it can be challenging to reuse these deep learning models or extend them to new domains.
Now researchers and enterprises are looking for ways to bring neural networks and symbolic AI techniques together.
“Neuro-symbolic modeling is one of the most exciting areas in AI right now,” said Brenden Lake, assistant professor of psychology and data science at New York University. His team has been exploring different ways to bridge the gap between the two AI approaches.
Companies like IBM are also pursuing how to extend these concepts to solve business problems, said David Cox, IBM Director of MIT-IBM Watson AI Lab.
“I would argue that symbolic AI is still waiting, not for data or compute, but deep learning,” Cox said.
His team is working with researchers from MIT CSAIL, Harvard University and Google DeepMind, to develop a new, large-scale video reasoning data set called, “CLEVRER: CoLlision Events for Video REpresentation and Reasoning.” This allows AI to recognize objects and reason about their behaviors in physical events from videos with only a fraction of the data required for traditional deep learning systems.
AI antagonists that have complementary strengths
Deep learning is incredibly adept at large-scale pattern recognition and at capturing complex correlations in massive data sets, NYU’s Lake said. In contrast, deep learning struggles at capturing compositional and causal structure from data, such as understanding how to construct new concepts by composing old ones or understanding the process for generating new data.
Symbolic models have a complementary strength: They are good at capturing compositional and causal structure. But they struggle to capture complex correlations. The unification of the two approaches would address the shortcomings of each.
“Neuro-symbolic [AI] models will allow us to build AI systems that capture compositionality, causality, and complex correlations,” Lake said.
Limits to learning by correlation
Hadayat Seddiqi, director of machine learning at InCloudCounsel, a legal technology company, said the time is right for developing a neuro-symbolic learning approach. “Deep learning in its present state cannot learn logical rules, since its strength comes from analyzing correlations in the data,” he said.
This attribute makes it effective at tackling problems where logical rules are exceptionally complex, numerous, and ultimately impractical to code, like deciding how a single pixel in an image should be labeled. However, correlation algorithms come with numerous weaknesses.
“This is a prime reason why language is not wholly solved by current deep learning systems,” Seddiqi said.
For almost any type of programming outside of statistical learning algorithms, symbolic processing is used; consequently, it is in some way a necessary part of every AI system. Indeed, Seddiqi said he finds it’s often easier to program a few logical rules to implement some function than to deduce them with machine learning. It is also usually the case that the data needed to train a machine learning model either doesn’t exist or is insufficient. In those cases, rules derived from domain knowledge can help generate training data.
This is important because all AI systems in the real world deal with messy data. Symbolic processing can help filter out irrelevant data. For example, in an application that uses AI to answer questions about legal contracts, simple business logic can filter out data from documents that are not contracts or that are contracts in a different domain such as financial services versus real estate.
System 1 vs. System 2 thinking
Deep learning is better suited for System 1 reasoning, said Debu Chatterjee, head of AI, ML and analytics engineering at ServiceNow, referring to the paradigm developed by the psychologist Daniel Kahneman in his book Thinking Fast and Slow. System 1 thinking is fast, associative, intuitive and automatic.
Deep learning, in its present state, interprets inputs from the messy, approximate, probabilistic real world Chatterjee said, and it is very powerful: “If you do this on a large enough data set, this can exceed human-level perception.”
When handling a complex input, deep learning can deal with perception problems that attempt to determine whether something is true: for example, whether a picture contains a cat versus a dog. But it is hard for humans to ascertain the properties of these deep learning systems and difficult to test whether they work or under what conditions they work or don’t work. They are opaque to human analysis.
Symbolic AI’s strength lies in its knowledge representation and reasoning through logic, making it more akin to Kahneman’s “System 2” mode of thinking, which is slow, takes work and demands attention. Symbolic reasoning is modular and easier to extend. That is because it is based on relatively simple underlying logic that relies on things being true, and on rules providing a means of inferring new things from things already known to be true. Humans understand how it reached its conclusions.
The weakness of symbolic reasoning is that it does not tolerate ambiguity as seen in the real world. One false assumption can make everything true, effectively rendering the system meaningless.
“There have been many attempts to extend logic to deal with this which have not been successful,” Chatterjee said. Alternatively, in complex perception problems, the set of rules needed may be too large for the AI system to handle.
“Any realistic AI system needs to have both deep learning and symbolic properties,” Chatterjee said. This means it needs to be good at both perception and being able to infer new things from existing facts.
Practical benefits of combining symbolic AI and deep learning
There are many practical benefits to developing neuro-symbolic AI. One of the biggest is to be able to automatically encode better rules for symbolic AI.
“With symbolic AI there was always a question mark about how to get the symbols,” IBM’s Cox said. The world is presented to applications that use symbolic AI as images, video and natural language, which is not the same as symbols.
“We are finding that neural networks can get you to the symbolic domain and then you can use a wealth of ideas from symbolic AI to understand the world,” Cox said.
Another way the two AI paradigms can be combined is by using neural networks to help prioritize how symbolic programs organize and search through multiple facts related to a question. For example, if an AI is trying to decide if a given statement is true, a symbolic algorithm needs to consider whether thousands of combinations of facts are relevant.
Humans have an intuition about which facts might be relevant to a query. For example, a medical diagnostic expert system would have to weigh a patient’s records and new complaints in making a medical suggestion, whereas an experienced human doctor could see the gestalt of the patient’s state and quickly understand how to investigate the new complaints or what tests to order.
Another benefit of combining the techniques lies in making the AI model easier to understand. Humans reason about the world in symbols, whereas neural networks encode their models using pattern activations. Humans don’t think in terms of patterns of weights in neural networks.
“Our vision is to use neural networks as a bridge to get us to the symbolic domain,” Cox said, referring to work that IBM is exploring with its partners.
The thing symbolic processing can do is provide formal guarantees that a hypothesis is correct. This could prove important when the revenue of the business is on the line and companies need a way of proving the model will behave in a way that can be predicted by humans. In contrast, a neural network may be right most of the time, but when it’s wrong, it’s not always apparent what factors caused it to generate a bad answer.
Pushing the limits of NLP
The deep learning community has made great progress in using new techniques like transformers for natural language understanding tasks. But this is not true understanding — not in the way that symbolic processing works, argued Cox. Transformer models like Google’s BERT and OpenAI’s GPT are really about discovering statistical regularities, he said. While this can be powerful, it is not the same thing as understanding.
AI researchers like Gary Marcus have argued that these systems struggle with answering questions like, “Which direction is a nail going into the floor pointing?” This is not the kind of question that is likely to be written down, since it is common sense.
“As impressive as things like transformers are on our path to natural language understanding, they are not sufficient,” Cox said.
Seddiqi expects many advancements to come from natural language processing. Language is a type of data that relies on statistical pattern matching at the lowest levels but quickly requires logical reasoning at higher levels. Pushing performance for NLP systems will likely be akin to augmenting deep neural networks with logical reasoning capabilities.
The greatest promise here is analogous to experimental particle physics, where large particle accelerators are built to crash atoms together and monitor their behaviors. In natural language processing, researchers have built large models with massive amounts of data using deep neural networks that cost millions of dollars to train. The next step lies in studying the networks to see how this can improve the construction of symbolic representations required for higher order language tasks.
Indeed a lot of work in explainable AI — the effort to highlight the inner workings of AI models relevant to a particular use case — seems to be focused on inferring the underlying concepts and rules, for the reason that rules are easier to explain than weights in a neural network, Chatterjee said.
A key factor in evolution of AI will be dependent on a common programming framework that allows simple integration of both deep learning and symbolic logic. “Without this, these approaches won’t mix, like oil and water,” he said.