This is the second column I am writing about the limitations of ’foundation models’. A few months ago, research around Artificial Intelligence (AI) models took a new turn with the long-anticipated release of foundation models such as GPT-3, DALL-E and Bert. These models attempt to map and include almost every document that is available on the internet. This is for use in what can only be described as a brute-force attempt to provide as global a repository as possible for future AI programs to base their own ‘training’ data on. The idea behind such foundation models that are funded by the world’s largest technology companies, such as Microsoft and Google, is that these models could form a base for all sorts of AI applications. Most importantly, they differ from other cognitive models that use smaller data sets to train AI systems, since their genesis as foundation models occurred by scouring almost every shred of information that is available on the web, a data store that is already gigantic and doubling in size every two years or so.
In AI models that do not use such foundation models, the program is trained on data that is very specific to the task at hand—such as analysing electro cardio graphs (ECGs) for evidence of a heart attack. In these cases, pattern recognition is key. Therefore, the training of an AI program to look for specific patterns in a data-set that surely contains instances of such patterns is an easier task than to train it on a global data-set that holds all manner of information. Simply put, If I am to write a computer program for reading ECGs to identify potential heart attacks, I’m not going to be feeding its training model with every piece of data that is available on the internet. I will instead feed it as many ECG readings as I possibly could, and only these. I would not add, say, stock-market charts or Shakespeare’s sonnets to the feed. As the number of ECG reports in my training model increases, I would in theory be able to increase the accuracy of my pattern recognition model to the point where it becomes more efficient than an experienced doctor looking at the same ECG data and drawing conclusions. Using a widely-fed foundation model for this crucial (but still limited) cognitive task would be a clear case of overkill. I wrote in the last instalment of IT Matters that these foundation models for AI will have to conquer new peaks, and the ECG scenario above provides one such example of where foundation models, based on basically all knowledge available out on the internet, will struggle to provide pinpoint accuracy around the repetitive but crucial task of getting ECG readings right. Contextual awareness embodies all the subtle nuances of human learning—it is the ‘who’, ‘why’, ‘when’ and ‘what’ that inform human decisions and behaviour. For instance, if an ECG were taken soon after a patient walked into a hospital complaining of acute chest pain, the context would be very different than if this reading were taken during a normal health check-up of a perfectly well person. Foundation models face a challenge getting context right. In addition to context, these large models also lack ‘world views’. The key to acquiring a world view model is a decoupling between the construction of building blocks of the world model and their subsequent use in the simulation of possible outcomes. In simple terms, a foundation model will have access to all sorts of graphs and charts—from stock market charts and scientific and engineering graphs to sine-wave amplitude models and so on—but all those graphs and charts can’t just be thrown at a specific simulation that is looking for potential outcomes such as a crash in the stock market.
Human beings and truly intelligent machines need to use models of the world or ‘world views’ to make sense of observations and assess potential futures to select the best course of action. In the example above, we would throw out all charts that aren’t related to the stock market. In changing from a generic large-scale setting (like replying to web search queries or a dictionary search for the meaning of a word) to direct interaction in a particular setting that includes multiple actors, the world model must be effectively down-scaled and customized to the task at hand. A decoupled, modular (meaning small-scale) and customizable approach is a logically different and a far less complex architecture than one that tries to simulate and reason everything out in a single ‘input-output function’ step, which is where foundation models would have us go. As human beings, you and I do not respond to queries without reference to this world view; in both our cases, our response is informed by several years of particular types of schooling as well as real-world experiences, and we would respond to a situation by taking small, specific components from this trove of learning and using them with precision to the task at hand. We don’t go through every single bit of information we have stuffed into our brains; we instead use our world views to hone down (or down-scale and componentize what we know, to use computer programming speak) our responses to a query or a situation. An AI program designed for a specific task based on a foundation model will be useless without providing this ability to use a world view to hone down or componentize the program’s ‘thinking’, just as we humans do. Throwing zettabytes of training data at it will not work. Siddharth Pai is co-founder of Siana Capital, and the author of ‘Techproof Me’
Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint.
Download The Mint News App to get Daily Market Updates.
Subscribe to Mint Newsletters * Enter a valid email * Thank you for subscribing to our newsletter.