Breaking News

Top Machine Learning Research Papers Released In 2020 – Analytics India Magazine

It has been only two weeks into the last month of the year and, the popular repository for ML research papers has already witnessed close to 600 uploads. This should give one the idea of the pace at which machine learning research is proceeding; however, keeping track of all these research work is almost impossible. Every year, the research that gets maximum noise is usually from companies like Google and Facebook; from top universities like MIT; from research labs and most importantly from the conferences like NeurIPS or ACL. 

CVPR: 1,470 research papers on computer vision accepted from 6,656 valid submissions.
ICLR: 687 out of 2594 papers made it to ICLR 2020 — a 26.5% acceptance rate.
ICML: 1088 papers have been accepted from 4990 submissions.

In this article, we have compiled a list of interesting machine learning research work that has made some noise this year. 

Natural Language Processing


This is the seminal paper that introduced the most popular ML model of the year — GPT-3. In the paper titled, “Transformers are few shot learners”, the OpenAI team used the same model and architecture as GPT-2 that includes modified initialisation, pre-normalisation, and reversible tokenisation along with alternating dense and locally banded sparse attention patterns in the layers of the transformer. While the GPT-3 model achieved promising results in the zero-shot and one-shot settings, in the few-shot setting, it occasionally surpassed state-of-the-art models. 


Usually, increasing model size when pretraining natural language representations often result in improved performance on downstream tasks, but the training times become longer. To address these problems, the authors in their work presented two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. The authors also used a self-supervised loss that focuses on modelling inter-sentence coherence and consistently helped downstream tasks with multi-sentence inputs. According to results, this model established new state-of-the-art results on the GLUE, RACE, and squad benchmarks while having fewer parameters compared to BERT-large. 

Check the paper here.

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

Microsoft Research, along with the University of Washington and the University of California, in this paper, introduced a model-agnostic and task agnostic methodology for testing NLP models known as CheckList. This is also the winner of the best paper award at the ACL conference this year. It included a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. 

Check the paper here.


Linformer is a Transformer architecture for tackling the self-attention bottleneck in Transformers. It reduces self-attention to an O(n) operation in both space- and time complexity. It is a new self-attention mechanism which allows the researchers to compute the contextual mapping in linear time and memory complexity with respect to the sequence length. 

Read more about the paper here.

Download our Mobile App

Plug and Play Language Models

Plug and Play Language Models (PPLM) are a combination of pre-trained language models with one or more simple attribute classifiers. This, in turn, assists in text generation without any further training. According to the authors, model samples demonstrated control over sentiment styles, and extensive automated and human-annotated evaluations showed attribute alignment and fluency. 

Check the paper here.


The researchers at Google, in this paper, introduced Reformer. This work showcased that the architecture of a Transformer can be executed efficiently on long sequences and with small memory. The authors believe that the ability to handle long sequences opens the way for the use of the Reformer on many generative tasks. In addition to generating very long coherent text, the Reformer can bring the power of Transformer models to other domains like time-series forecasting, music, image and video generation. 

Check the paper here.


To overcome the limitations of sparse transformers, Google, in another paper, introduced Performer which uses an efficient (linear) generalised attention framework and has the potential to directly impact research on biological sequence analysis and more. The authors stated that modern bioinformatics could immensely benefit from faster, more accurate language models, for development of new nanoparticle vaccines. 

Check paper here.

Computer Vision

An Image is Worth 16X16 Words

The irony here is that one of the popular language models, Transformers have been made to do computer vision tasks. In this paper, the authors claimed that the vision transformer could go toe-to-toe with the state-of-the-art models on image recognition benchmarks, reaching accuracies as high as 88.36% on ImageNet and 94.55% on CIFAR-100. For this, the vision transformer receives input as a one-dimensional sequence of token embeddings. The image is then reshaped into a sequence of flattened 2D patches. The transformers in this work use constant widths through all of its layers.

Check the paper here.

Unsupervised Learning of Probably Symmetric Deformable 3D Objects

Winner of the CVPR best paper award, in this work, the authors proposed a method to learn 3D deformable object categories from raw single-view images, without external supervision. This method uses an autoencoder that factored each input image into depth, albedo, viewpoint and illumination. The authors showcased that reasoning about illumination can be used to exploit the underlying object symmetry even if the appearance is not symmetric due to shading.

Check the paper here.

Generative Pretraining from Pixels

In this paper, OpenAI researchers examined whether similar models can learn useful representations for images. For this, the researchers trained a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, the researchers found that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, it achieved 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full fine-tuning and matching the top supervised pre-trained models. An even larger model, trained on a mixture of ImageNet and web images, is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of their features.

Check the paper here.

Reinforcement Learning

Deep Reinforcement Learning and its Neuroscientific Implications

In this paper, the authors provided a high-level introduction to deep RL, discussed some of its initial applications to neuroscience, and surveyed its wider implications for research on brain and behaviour and concluded with a list of opportunities for next-stage research. Although DeepRL seems to be promising, the authors wrote that it is still a work in progress and its implications in neuroscience should be looked at as a great opportunity. For instance, deep RL provides an agent-based framework for studying the way that reward shapes representation, and how representation, in turn, shapes learning and decision making — two issues which together span a large swath of what is most central to neuroscience. 

Check the paper here.

Dopamine-based Reinforcement Learning

Why humans doing certain things are often linked to dopamine, a hormone that acts as the reward system (think: the likes on your Instagram page). So, keeping this fact in hindsight, DeepMind with the help of Harvard labs, analysed dopamine cells in mice and recorded how the mice received rewards while they learned a task. They then checked these recordings for consistency in the activity of the dopamine neurons with standard temporal difference algorithms. This paper proposed an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning. The authors hypothesised that the brain represents possible future rewards not as a single mean but as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. 

Check the paper here.

Lottery Tickets In Reinforcement Learning & NLP

In this paper, the authors bridged natural language processing (NLP) and reinforcement learning (RL). They examined both recurrent LSTM models and large-scale Transformer models for NLP and discrete-action space tasks for RL. The results suggested that the lottery ticket hypothesis is not restricted to supervised learning of natural images, but rather represents a broader phenomenon in deep neural networks.

Check the paper here.

What Can Learned Intrinsic Rewards Capture

In this paper, the authors explored if the reward function itself can be a good locus of learned knowledge. They proposed a scalable framework for learning useful intrinsic reward functions across multiple lifetimes of experience and showed that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. 

See Also

Check the paper here.


AutoML- Zero

The progress of AutoML has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks, or similarly restrictive search spaces. In this paper, the authors showed that AutoML could go further with AutoML Zero, that automatically discovers complete machine learning algorithms just using basic mathematical operations as building blocks. The researchers demonstrated this by introducing a novel framework that significantly reduced human bias through a generic search space.

Check the paper here.

Rethinking Batch Normalization for Meta-Learning

Batch normalization is an essential component of meta-learning pipelines. However, there are several challenges. So, in this paper, the authors evaluated a range of approaches to batch normalization for meta-learning scenarios and developed a novel approach — TaskNorm. Experiments demonstrated that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient-based and gradient-free meta-learning approaches. The TaskNorm has been found to be consistently improving the performance.

Check the paper here.

Meta-Learning without Memorisation

Meta-learning algorithms need meta-training tasks to be mutually exclusive, such that no single model can solve all of the tasks at once. In this paper, the authors designed a meta-regularisation objective using information theory that successfully uses data from non-mutually-exclusive tasks to efficiently adapt to novel tasks.

Check the paper here.

Understanding the Effectiveness of MAML

Model Agnostic Meta-Learning (MAML) consists of optimisation loops, from which the inner loop can efficiently learn new tasks. In this paper, the authors demonstrated that feature reuse is the dominant factor and led to ANIL (Almost No Inner Loop) algorithm — a simplification of MAML where the inner loop is removed for all but the (task-specific) head of the underlying neural network. 

Check the paper here.

Your Classifier is Secretly an Energy-Based Model

This paper proposed attempts to reinterpret a standard discriminative classifier as an energy-based model. In this setting, wrote the authors, the standard class probabilities can be easily computed. They demonstrated that energy-based training of the joint distribution improves calibration, robustness, handout-of-distribution detection while also enabling the proposed model to generate samples rivalling the quality of recent GAN approaches. This work improves upon the recently proposed techniques for scaling up the training of energy-based models. It has also been the first to achieve performance rivalling the state-of-the-art in both generative and discriminative learning within one hybrid model.

Check the paper here.

Reverse-Engineering Deep ReLU Networks

This paper investigated the commonly assumed notion that neural networks cannot be recovered from its outputs, as they depend on its parameters in a highly nonlinear way. The authors claimed that by observing only its output, one could identify the architecture, weights, and biases of an unknown deep ReLU network. By dissecting the set of region boundaries into components associated with particular neurons, the researchers showed that it is possible to recover the weights of neurons and their arrangement within the network.

Check the paper here.

(Note: The list is in no particular order and is a compilation based on the reputation of the publishers, reception to these research work in popular forums and feedback of the experts on social media. If you think we have missed any exceptional research work, please comment below)

If you loved this story, do join our Telegram Community.

Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.