By:Dr. Basheer Hawwash, Principal Data Scientist
Amanda Coogan, Risk-Based Monitoring Senior Product Manager
Rhonda Roberts, Senior Data Scientist
Remarque Systems Inc.
Everyone knows the terms “machine learning” and “artificial intelligence.” Few can define them, much less explain their inestimable value to clinical trials. So, it’s not surprising that, despite their ability to minimize risk, improve safety, condense timelines, and save costs, these technology tools are not widely used by the clinical trial industry.
There are lots of reasons for resistance: It seems complicated. Those who are not statistically savvy may find the thought of algorithms overwhelming. Adopting new technology requires a change in the status quo.
Yet, there are more compelling reasons for adoption – especially as the global pandemic has accelerated a trend toward patient-centricity and decentralized trials, and an accompanying need for remote monitoring.
Machine learning vs. artificial intelligence. What’s the difference?
Let’s start by understanding what the two terms mean. While many people seem to use them interchangeably, they are distinct: machine learning can be used independently or to inform artificial intelligence; artificial intelligence cannot happen without machine learning.
Machine learning is a series of algorithms that analyze data in various ways. These algorithms search for patterns and trends, which can then be used to make more informed decisions. Supervised machine learning starts with a specific type of data – for instance, a particular adverse event. By analyzing the records of all the patients who have had that specific adverse event, the algorithm can predict whether a new patient is also likely to suffer from it. Conversely, unsupervised machine learning applies analysis such as clustering to a group of data; the algorithm sorts the data into groups which researchers can then examine more closely to discern similarities they may not have considered previously.
In either case, artificial intelligence applies those data insights to mimic human problem-solving behavior. Speech recognition, self-driving cars, even forms that auto-populate all exist because of artificial intelligence. In each case, it is the vast amounts of data that have been ingested and analyzed by machine learning that make the artificial intelligence application possible.
Physicians, for instance, can use a combination of machine learning and artificial intelligence to enhance diagnostic abilities. In this way, given a set of data, machine learning tools can analyze images to find patterns of chronic obstructive pulmonary disease (COPD); artificial intelligence may be able to further identify that some patients have idiopathic pulmonary fibrosis (IPF) as well as COPD, something their physicians may neither have thought to look for, nor found unaided.
Now, researchers are harnessing both machine learning and artificial intelligence in their clinical trial work, introducing new efficiencies while enhancing patient safety and trial outcomes.
The case of the missing data
Data is at the core of every clinical trial. If those data are not complete, then researchers are proceeding on false assumptions, which can jeopardize patient safety – and even the entire trial.
Traditionally, researchers have guarded against this possibility by doing painstaking manual verification, examining every data point in the electronic data capture system to ensure that it is both accurate and complete. More automated systems may provide reports that researchers can look through – but that still requires a lot of human involvement. The reports are static and must be reviewed on an ongoing basis – and every review has the potential for human error.
Using machine learning, this process happens continually in the background throughout the trial, automatically notifying researchers when data are missing. This can make a material difference in a trial’s management and outcomes.
Consider, if you will, a study in which patients are tested for a specific metric every two weeks. Six weeks into the study, 95 percent of the patients show a value for that metric; 5 percent don’t. Those values are missing. The system will alert researchers, enabling them to act promptly to remedy the situation. They may be able to contact the patients in the 5 percent and get their values, or they may need to adjust those patients out of the study. The choice is left to the research team – but because they have the information in near-real time, they have a choice.
As clinical trials move to new models, with greater decentralization and greater reliance on patient-reported data, missing data may become a larger issue. To counteract that possibility, researchers will need to move away from manual methods and embrace both the ease and accuracy of machine-learning-based systems.
The importance of the outlier
In research studies, not every patient – nor even every site – reacts the same way. There are patients whose vital signs are off the charts. Sites with results that are too perfect. Outliers.
Often researchers discover these anomalies deep into the trial, during the process of cleaning the data in preparation for regulatory submission. That may be too late for a patient who is having a serious reaction to a study drug. It also may mean that the patient’s data are not valid and cannot be included in the end analysis. Caught earlier, there would be the possibility of a course correction. The patient might have been able to stay in the study, to continue to provide data; alternatively, they could be removed promptly along with their associated data.
Again, machine learning simplifies the process. By running an algorithm that continually searches for outliers, those irregularities are instantly identified. Researchers can then quickly drill down to ascertain whether there is an issue and, if so, determine an appropriate response.
Of course, an anomaly doesn’t necessarily flag a safety issue. In a recent case, one of the primary endpoints involved a six-minute walk test. One site showed strikingly different results; as it happened, they were using a different measurement gauge, something that would have skewed the study results, but, having been flagged, was easily modified.
In another case, all the patients at a site were rated with maximum quality of life scores – and all their blood pressure readings were whole numbers. Machine learning algorithms flagged these results because they varied dramatically from the readings at the other sites. On examination, researchers found that the site was submitting fraudulent reports. While that was disturbing to learn, the knowledge gave the trial team power to act, before the entire study was rendered invalid.
A changing landscape demands a changing approach
As quality management is increasingly focusing on risk-based strategies, harnessing machine learning algorithms simplifies and strengthens the process. Setting parameters based on study endpoints and study-specific risks, machine learning systems can run in the background throughout a study, providing alerts and triggers to help researchers avoid risks.
The need for such risk-based monitoring has accelerated in response to the COVID-19 pandemic. With both researchers and patients unable or unwilling to visit sites, studies have rapidly become decentralized. This has coincided with the emergence and growing importance of patient-centricity and further propelled the rise of remote monitoring. Processes are being forced online. Manual methods are increasingly insufficient – and automated methods that incorporate machine learning and artificial intelligence are gaining primacy.
Marrying in-depth statistical thinking with critical analysis
The trend towards electronic systems does not replace either the need for or the value of clinical trial monitors and other research personnel; they are simply able to do their jobs more effectively. A machine-learning-based system runs unique algorithms, each analyzing data in a different way to produce visualizations, alerts, or workflows, which CROs and sponsors can use to improve patient safety and trial efficiency. Each algorithm is tailored to the specific trial, keyed to endpoints, known risks, or other relevant factors. While the algorithms offer guidance, the platform does not make any changes to the data or the trial process; it merely alerts researchers to examine the data and determine whether a flagged value is clinically significant. Trial personnel are relieved of much tedious, reproducible, manual work, and are able to use their qualifications to advance the trial in other meaningful ways.
The imperative to embrace change
Machine learning and artificial intelligence have long been buzzwords in the clinical trial industry – yet these technologies have only haltingly been put to use. It’s time for that pendulum to swing. We can move more quickly and more precisely than manual data verification, and data cleaning allow. We can work more efficiently if we harness data to drive trial performance rather than simply to prove that the study endpoints were achieved. We can operate more safely if we are programmed for risk management from the outset. All this can be achieved easily, with the application of machine learning and artificial intelligence. Now is the time to move forward.
By:Dr. Basheer Hawwash, Principal Data Scientist