Algorithms similar to those used by Netflix, Amazon, and Facebook may lead to better comprehension of the biological language of cancer and neurodegenerative diseases such as Parkinson disease (PD) and Alzheimer disease (AD), according to a research article published in Proceedings of the National Academy of Sciences.
Researchers highlight that intracellular phase separation of proteins into biomolecular disordered proteins that form liquid-like droplets of proteins, called condensates, has become increasingly recognized as a key factor to categorize and regulate the proteins that may contribute to cancer and neurodegenerative diseases. However, determining the optimal way to examine and subsequently predict how proteins will manifest has garnered several hypotheses.
For conditions like PD, whose clinical manifestation is characterized by the abnormal clumping of proteins and loss of dopaminergic cells, understanding how protein condensates form could prove significant in developing targeted therapies that may slow or prevent progression.
Conducting an analysis on the potential applicability of protein sequencing on phase behavior, researchers leveraged an approach similar to that of Netflix when it recommends a series to watch, Facebook when it suggests someone to befriend, or Amazon through voice assistants such as Alexa, in which machine-learning algorithms make highly educated predictions on what people will do next or, for this case, how proteins will react next.
Researchers used the machine learning technology to train a large-scale language model to examine and predict protein abnormality inside the body associated with disease onset.
“We specifically asked the program to learn the language of shapeshifting biomolecular condensates—droplets of proteins found in cells—that scientists really need to understand to crack the language of biological function and malfunction that cause cancer and neurodegenerative diseases like AD,” said study author Kadi Liis Saar, PhD, research fellow at St John’s College, in a statement.
In their findings, researchers discovered that the technology could distinguish between structured proteins and unstructured proteins prone to disease in humans at a high accuracy.
Speaking further on disordered proteins, lead study author Tuomas Knowles, PhD, professor of the Yusuf Hamied Department of Chemistry at the University of Cambridge, said in a statement that these protein condensates have garnered substantial attention as they can control key events in the cell such as gene expression and protein synthesis.
“Any defects connected with these protein droplets can lead to diseases such as cancer. This is why bringing natural language processing technology into research into the molecular origins of protein malfunction is vital if we want to be able to correct the grammatical mistakes inside cells that cause disease,” said Knowles.
With the ultimate aim of using artificial intelligence to develop targeted drugs, Knowles said that this approach could also be leveraged to expand upon current knowledge of cancers and neurodegenerative diseases, as well as even preventing dementia from happening at all.
“Machine-learning can be free of the limitations of what researchers think are the targets for scientific exploration and it will mean new connections will be found that we have not even conceived of yet. It is really very exciting indeed,” said Saar.
Saar KL, Morgunov AS, Qi R, et al. Learning the molecular grammar of protein condensates from sequence determinants and embeddings. Proc Natl Acad Sci U S A. Published online April 13, 2021. doi:10.1073/pnas.2019053118