Deep learning pioneer Yoshua Bengio has provocative ideas about the future of AI.
Maryse Boyce, IEEE Spectrum
For the first part of this article series, see here.
The field of artificial intelligence moves fast. It has only been 8 years since the modern era of deep learning began at the 2012 ImageNet competition. Progress in the field since then has been breathtaking and relentless.
If anything, this breakneck pace is only accelerating. Five years from now, the field of AI will look very different than it does today. Methods that are currently considered cutting-edge will have become outdated; methods that today are nascent or on the fringes will be mainstream.
What will the next generation of artificial intelligence look like? Which novel AI approaches will unlock currently unimaginable possibilities in technology and business?
My previous column covered three emerging areas within AI that are poised to redefine the field—and society—in the years ahead. This article will cover three more.
4. Neural Network Compression
Recommended For You
AI is moving to the edge.
There are tremendous advantages to being able to run AI algorithms directly on devices at the edge—e.g., phones, smart speakers, cameras, vehicles—without sending data back and forth from the cloud.
Perhaps most importantly, edge AI enhances data privacy because data need not be moved from its source to a remote server. Edge AI is also lower latency since all processing happens locally; this makes a critical difference for time-sensitive applications like autonomous vehicles or voice assistants. It is more energy- and cost-efficient, an increasingly important consideration as the computational and economic costs of machine learning balloon. And it enables AI algorithms to run autonomously without the need for an Internet connection.
Nvidia CEO Jensen Huang, one of the titans of the AI business world, sees edge AI as the future of computing: “AI is moving from the cloud to the edge, where smart sensors connected to AI computers can speed checkouts, direct forklifts, orchestrate traffic, save power. In time, there will be trillions of these small autonomous computers, powered by AI.”
But in order for this lofty vision of ubiquitous intelligence at the edge to become a reality, a key technology breakthrough is required: AI models need to get smaller. A lot smaller. Developing and commercializing techniques to shrink neural networks without compromising their performance has thus become one of the most important pursuits in the field of AI.
The typical deep learning model today is massive, requiring significant computational and storage resources in order to run. OpenAI’s new language model GPT-3, which made headlines this summer, has a whopping 175 billion model parameters, requiring more than 350 GB just to store the model. Even models that don’t approach GPT-3 in size are still extremely computationally intensive: ResNet-50, a widely used computer vision model developed a few years ago, uses 3.8 billion floating-point operations per second to process an image.
These models cannot run at the edge. The hardware processors in edge devices (think of the chips in your phone, your Fitbit, or your Roomba) are simply not powerful enough to support them.
Developing methods to make deep learning models more lightweight therefore represents a critical unlock: it will unleash a wave of product and business opportunities built around decentralized artificial intelligence.
How would such model compression work?
Researchers and entrepreneurs have made tremendous strides in this field in recent years, developing a series of techniques to miniaturize neural networks. These techniques can be grouped into five major categories: pruning, quantization, low-rank factorization, compact convolutional filters, and knowledge distillation.
Pruning entails identifying and eliminating the redundant or unimportant connections in a neural network in order to slim it down. Quantization compresses models by using fewer bits to represent values. In low-rank factorization, a model’s tensors are decomposed in order to construct sparser versions that approximate the original tensors. Compact convolutional filters are specially designed filters that reduce the number of parameters required to carry out convolution. Finally, knowledge distillation involves using the full-sized version of a model to “teach” a smaller model to mimic its outputs.
These techniques are mostly independent from one another, meaning they can be deployed in tandem for improved results. Some of them (pruning, quantization) can be applied after the fact to models that already exist, while others (compact filters, knowledge distillation) require developing models from scratch.
A handful of startups has emerged to bring neural network compression technology from research to market. Among the more promising are Pilot AI, Latent AI, Edge Impulse and Deeplite. As one example, Deeplite claims that its technology can make neural networks 100x smaller, 10x faster, and 20x more power efficient without sacrificing performance.
“The number of devices in the world that have some computational capability has skyrocketed in the last decade,” explained Pilot AI CEO Jon Su. “Pilot AI’s core IP enables a significant reduction in the size of the AI models used for tasks like object detection and tracking, making it possible for AI/ML workloads to be run directly on edge IoT devices. This will enable device manufacturers to transform the billions of sensors sold every year—things like push button doorbells, thermostats, or garage door openers—into rich tools that will power the next generation of IoT applications.”
Large technology companies are actively acquiring startups in this category, underscoring the technology’s long-term strategic importance. Earlier this year Apple acquired Seattle-based Xnor.ai for a reported $200 million; Xnor’s technology will help Apple deploy edge AI capabilities on its iPhones and other devices. In 2019 Tesla snapped up DeepScale, one of the early pioneers in this field, to support inference on its vehicles.
And one of the most important technology deals in years—Nvidia’s pending $40 billion acquisition of Arm, announced last month—was motivated in large part by the accelerating shift to efficient computing as AI moves to the edge.
Emphasizing this point, Nvidia CEO Jensen Huang said of the deal: “Energy efficiency is the single most important thing when it comes to computing going forward….together, Nvidia and Arm are going to create the world’s premier computing company for the age of AI.”
In the years ahead, artificial intelligence will become untethered, decentralized and ambient, operating on trillions of devices at the edge. Model compression is an essential enabling technology that will help make this vision a reality.
5. Generative AI
Today’s machine learning models mostly interpet and classify existing data: for instance, recognizing faces or identifying fraud. Generative AI is a fast-growing new field that focuses instead on building AI that can generate its own novel content. To put it simply, generative AI takes artificial intelligence beyond perceiving to creating.
Two key technologies are at the heart of generative AI: generative adversarial networks (GANs) and variational autoencoders (VAEs).
The more attention-grabbing of the two methods, GANs were invented by Ian Goodfellow in 2014 while he was pursuing his PhD at the University of Montreal under AI pioneer Yoshua Bengio.
Goodfellow’s conceptual breakthrough was to architect GANs with two separate neural networks—and then pit them against one another.
Starting with a given dataset (say, a collection of photos of human faces), the first neural network (called the “generator”) begins generating new images that, in terms of pixels, are mathematically similar to the existing images. Meanwhile, the second neural network (the “discriminator”) is fed photos without being told whether they are from the original dataset or from the generator’s output; its task is to identify which photos have been synthetically generated.
As the two networks iteratively work against one another—the generator trying to fool the discriminator, the discriminator trying to suss out the generator’s creations—they hone one another’s capabilities. Eventually the discriminator’s classification success rate falls to 50%, no better than random guessing, meaning that the synthetically generated photos have become indistinguishable from the originals.
In 2016, AI great Yann LeCun called GANs “the most interesting idea in the last ten years in machine learning.”
VAEs, introduced around the same time as GANs, are a conceptually similar technique that can be used as an alternative to GANs.
Like GANs, VAEs consist of two neural networks that work in tandem to produce an output. The first network (the “encoder”) takes a piece of input data and compresses it into a lower-dimensional representation. The second network (the “decoder”) takes this compressed representation and, based on a probability distribution of the original data’s attributes and a randomness function, generates novel outputs that “riff” on the original input.
In general, GANs generate higher-quality output than do VAEs but are more difficult and more expensive to build.
Like artificial intelligence more broadly, generative AI has inspired both widely beneficial and frighteningly dangerous real-world applications. Only time will tell which will predominate.
On the positive side, one of the most promising use cases for generative AI is synthetic data. Synthetic data is a potentially game-changing technology that enables practitioners to digitally fabricate the exact datasets they need to train AI models.
Getting access to the right data is both the most important and the most challenging part of AI today. Generally, in order to train a deep learning model, researchers must collect thousands or millions of data points from the real world. They must then have labels attached to each data point before the model can learn from the data. This is at best an expensive and time-consuming process; at worst, the data one needs is simply impossible to get one’s hands on.
Synthetic data upends this paradigm by enabling practitioners to artificially create high-fidelity datasets on demand, tailored to their precise needs. For instance, using synthetic data methods, autonomous vehicle companies can generate billions of different driving scenes for their vehicles to learn from without needing to actually encounter each of these scenes on real-world streets.
As synthetic data approaches real-world data in accuracy, it will democratize AI, undercutting the competitive advantage of proprietary data assets. In a world in which data can be inexpensively generated on demand, the competitive dynamics across industries will be upended.
A crop of promising startups has emerged to pursue this opportunity, including Applied Intuition, Parallel Domain, AI.Reverie, Synthesis AI and Unlearn.AI. Large technology companies—among them Nvidia, Google and Amazon—are also investing heavily in synthetic data. The first major commercial use case for synthetic data was autonomous vehicles, but the technology is quickly spreading across industries, from healthcare to retail and beyond.
Counterbalancing the enormous positive potential of synthetic data, a different generative AI application threatens to have a widely destructive impact on society: deepfakes.
We covered deepfakes in detail in this column earlier this year. In essence, deepfake technology enables anyone with a computer and an Internet connection to create realistic-looking photos and videos of people saying and doing things that they did not actually say or do.
The first use case to which deepfake technology has been widely applied is pornography. According to a July 2019 report from startup Sensity, 96% of deepfake videos online are pornographic. Deepfake pornography is almost always non-consensual, involving the artificial synthesis of explicit videos that feature famous celebrities or personal contacts.
From these dark corners of the Internet, the use of deepfakes has begun to spread to the political sphere, where the potential for harm is even greater. Recent deepfake-related political incidents in Gabon, Malaysia and Brazil may be early examples of what is to come.
In a recent report, The Brookings Institution grimly summed up the range of political and social dangers that deepfakes pose: “distorting democratic discourse; manipulating elections; eroding trust in institutions; weakening journalism; exacerbating social divisions; undermining public safety; and inflicting hard-to-repair damage on the reputation of prominent individuals, including elected officials and candidates for office.”
The core technologies underlying synthetic data and deepfakes are the same. Yet the use cases and potential real-world impacts are diametrically opposed.
It is a great truth in technology that any given innovation can either confer tremendous benefits or inflict grave harm on society, depending on how humans choose to employ it. It is true of nuclear energy; it is true of the Internet. It is no less true of artificial intelligence. Generative AI is a powerful case in point.
6. “System 2” Reasoning
In his landmark book Thinking, Fast And Slow, Nobel-winning psychologist Daniel Kahneman popularized the concepts of “System 1” thinking and “System 2” thinking.
System 1 thinking is intuitive, fast, effortless and automatic. Examples of System 1 activities include recognizing a friend’s face, reading the words on a passing billboard, or completing the phrase “War And _______”. System 1 requires little conscious processing.
System 2 thinking is slower, more analytical and more deliberative. Humans use System 2 thinking when effortful reasoning is required to solve abstract problems or handle novel situations. Examples of System 2 activities include solving a complex brain teaser or determining the appropriateness of a particular behavior in a social setting.
Though the System 1/System 2 framework was developed to analyze human cognition, it maps remarkably well to the world of artificial intelligence today. In short, today’s cutting-edge AI systems excel at System 1 tasks but struggle mightily with System 2 tasks.
AI leader Andrew Ng summarized this well: “If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future.”
Yoshua Bengio’s 2019 keynote address at NeurIPS explored this exact theme. In his talk, Bengio called on the AI community to pursue new methods to enable AI systems to go beyond System 1 tasks to System 2 capabilities like planning, abstract reasoning, causal understanding, and open-ended generalization.
“We want to have machines that understand the world, that build good world models, that understand cause and effect, and can act in the world to acquire knowledge,” Bengio said.
There are many different ways to frame the AI discipline’s agenda, trajectory and aspirations. But perhaps the most powerful and compact way is this: in order to progress, AI needs to get better at System 2 thinking.
No one yet knows with certainty the best way to move toward System 2 AI. The debate over how to do so has coursed through the field in recent years, often contentiously. It is a debate that evokes basic philosophical questions about the concept of intelligence.
Bengio is convinced that System 2 reasoning can be achieved within the current deep learning paradigm, albeit with further innovations to today’s neural networks.
“Some people think we need to invent something completely new to face these challenges, and maybe go back to classical AI to deal with things like high-level cognition,” Bengio said in his NeurIPS keynote. “[But] there is a path from where we are now, extending the abilities of deep learning, to approach these kinds of high-level questions of cognitive system 2.”
Bengio pointed to attention mechanisms, continuous learning and meta-learning as existing techniques within deep learning that hold particular promise for the pursuit of System 2 AI.
Others, though, believe that the field of AI needs a more fundamental reset.
Professor and entrepreneur Gary Marcus has been a particularly vocal advocate of non-deep-learning approaches to System 2 intelligence. Marcus has called for a hybrid solution that combines neural networks with symbolic methods, which were popular in the earliest years of AI research but have fallen out of favor more recently.
“Deep learning is only part of the larger challenge of building intelligent machines,” Marcus wrote in the New Yorker in 2012, at the dawn of the modern deep learning era. “Such techniques lack ways of representing causal relationships and are likely to face challenges in acquiring abstract ideas….They have no obvious ways of performing logical inferences, and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used.”
Marcus co-founded robotics startup Robust.AI to pursue this alternative path toward AI that can reason. Just yesterday, Robust announced its $15 million Series A fundraise.
Computer scientist Judea Pearl is another leading thinker who believes the road to System 2 reasoning lies beyond deep learning. Pearl has for years championed causal inference—the ability to understand cause and effect, not just statistical association—as the key to building truly intelligent machines. As Pearl put it recently: “All the impressive achievements of deep learning amount to just curve fitting.”
Of the six AI areas explored in this article series, this final one is, purposely, the most open-ended and abstract. There are many potential paths to System 2 AI; the road ahead remains shrouded. It is likely to be a circuitous and perplexing journey. But within our lifetimes, it will transform the economy and the world.