Cat got your tongue? A beginner’s guide to speaking intelligently on AI, ML, and the science of data

  • Hannah
  • January 31, 2022
  • Comments Off on Cat got your tongue? A beginner’s guide to speaking intelligently on AI, ML, and the science of data

Why Facebook’s new algorithm might transform AI

When a child sees a cat for the first time a parent or supervising adult utters the word cat (or sometimes, cutely, meow!) and after this process is repeated a few times the child understands that the word ‘cat’ (or ‘meow’, which millions of children still use instead of the more adult version) is the name of that furry little 4-legged creature. Similarly, they learn the names of dogs, elephants and other animals and objects. And finally comes the assessments, where parents gleefully show their friends how smart the child is by showing them pictures of animals and the child manages to mumble the word ‘cat’ or ‘dog’ or another animal. If the child is correct, he or she is applauded and hugged, and if they are not, they are corrected (and still hugged, one hopes), till a new set of friends come by to be dazzled by the child’s genius. It is a wonderous process which can engross the most cynical of people for hours and years! 

This, dear reader, is the process of supervised learning, and most algorithms use the exact same process as the 9-month-old of the previous paragraph. They are provided with data (or pictures, in this case) which are clearly labelled as cats and after seeing 10,000 of these pictures the algorithm learns that a cat is a cat and how to see a video/picture of a cat and identify it. If they make a mistake, they are corrected. And they learn some more till they are near perfect. (And yes, the best algorithms need 10,000 pictures which the 9-month-old could do with 3…why? Well, that is to be discussed in my next article).

The learning of the child though does not end there though. In a few years, from the cats and the dogs the child moves on to the birds and the bees and amazingly enough want to discuss matters, which, adults never realize how they got to! “Maybe he/she is too young to discuss this” …anyone heard this line before? “How did she even think about this”? Well, simply because the child does not only learn under adult supervision. They have complex human brains which can connect the dots, observe the patterns, move from one thing to another and before you know, voila! They have managed to ask an extremely embarrassing question about the birds and bees in front of these same friends who were oh so impressed with their animal identifying skills. 

But can a machine do that? How can a machine identify patterns that it has not been taught? In a manner of speaking an algorithm can figure out patterns which are not spelt out to it through a process called unsupervised/self-supervised learning (the difference between the two, as obvious from the names, is just marketing). Simply put, it can look at reams of data points (let’s say lots of cat and dog pictures) and by calculating the distance between the dog-like features of a dog and the cat-like features of a cat, it can identify clearly that cats and dogs are different, even when no programmer has told the algorithm the names of the animals. It knows, these are different beasts. 

I am sure you would have guessed by now that the potential for these unsupervised learning models, if done right, can be enormous. You do not have to painstakingly teach a machine that the phrase is “black cat” but never “cat black” and thus it can generalize it to the fact that in most cases the qualifying adjective in English comes before the qualified noun. Reading reams of texts, the algorithm can auto-deduce such rules which have been learnt by so many of us in those mindless mornings with M/s Wren and Martin. Most people who write pristine English do not really know these rules, do they? Do you still remember your subjects and predicates, and clauses and nouns? Arguably not, but you manage to communicate rather fine. And that is what makes self-supervised learning so human-like. The ability to generate rules from facts without being told the rules. A classic case was when (the now billionaire hedge fund manager and noted Trump ally) Bob Mercer, in a past role working on translations in natural language processing at IBM, moved away from understanding the rules of grammar. In the Canadian parliament, all speeches need to be filed in both English and French. Mercer and his colleagues got both sets of documents and asked his algorithm to literally ‘figure it out’ instead of trying to teach it the rules of language. And so, it did, opening frontiers of self-supervised learning.

Meta’s (Facebook) new algorithm data2vec was launched on 20th January 2022. The new frontier that this model is trying to open is the ability to move self-supervised learning beyond specific use cases. Till now all self-supervised models were focused on solving a specific type of problem – language models cannot be used to learn from visuals, while models for either of these cannot solve problems related to audio. So, a model that works behind voice recognition for an audio Assistant will be very different from a model that can support the same company’s Translate engine. We know that the reasoning process behind the child’s identification of rules across dimensions are similar but for computers, till now, it has been extremely case-specific. By generalizing the self-learning model, Facebook is trying to solve very different problems with the same algorithm, and thus take another step in generalized artificial intelligence. The same model will listen, see, read, and thus infer rules and relationships from across these different inputs! The model has (according to Facebook) outperformed the best performing models for these individual types of problems – it reads better than BERT, listens better than wav2vec 2.0 and sees better than data2vec. 

Facebook is now calling its friends to impress them with their child’s genius…we hope it can live up to its promise. 


Views expressed above are the author’s own.