In a study published in PLoS One, researchers used electronic health record (EHR) data to determine whether machine learning and knowledge engineering may facilitate the diagnosis of acute hepatic porphyria (AHP). Results strongly suggest the method can be an effective approach to identify patients who should receive a diagnostic biochemical test to screen for AHP.
AHP is a family of rare, genetic diseases that manifest as debilitating, potentially life-threatening attacks. Specifically, the disease results in a buildup of toxic porphyrin molecules, which are formed during the production of heme, which helps to bind oxygen in the blood.
Because rare diseases are so infrequently encountered, they are often delayed in diagnosis by clinicians or not diagnosed at all. “They also often have general presentations with diffuse symptoms, as well as genetic components which may require specialized testing,” the researchers wrote, while “a lack of accurate diagnoses increases economic burden to healthcare systems as patients continue to receive inadequate and/or inappropriate treatment.”
The current prevalence of diagnosed symptomatic patients with AHP is roughly 1 per 100,000, and a US study revealed diagnosis is delayed an average of 15 years. According to the researchers, “the preferred diagnostic procedure for AHP is biochemical testing of random/spot urine for aminolevulinic acid (ALA), porphobilinogen (PBG), and porphyrin.”
In the current study, investigators used a dataset of approximately 200,000 patient records from the Oregon Health & Science University (OHSU) Research Data Warehouse, containing patient information from January 2009 to March 2019. OHSU is the only academic medical center in the state and is thus a referral center for rare diseases. Using machine learning, the researchers sought to identify patients who have a documented clinical history of AHP or a clinical history indicating that AHP diagnostic testing may be appropriate.
“To ensure an adequate sample size to make predictive models robust, we enriched the data set for possible AHP by adding records from an additional 5571 patients,” who met certain case-insensitive criteria, such as a diagnosis including the search term “porph” in the diagnosis name, the authors explained.
Manual chart reviews of patients with a confirmed diagnosis of AHP (based on International Classification of Diseases, Tenth Revision, Clinical Modification code E80.21 for acute intermittent [hepatic] porphyria) (n = 47) were carried out to develop a gold standard for the data. Thirty cases were deemed positive cases for the machine learning modes while the remaining cases were used as negative cases.
“We parsed the record into features, which were scored by frequency of appearance and filtered using univariate feature analysis,” the researchers said. “We trained on the full dataset, with the best cross-validation performance coming from support vector machine (SVM) algorithm using a radial basis function (RBF) kernel.” The trained model was then applied back to the full dataset and ranked patients by margin distance.
The investigators found:
The top 100 ranked negative cases were manually reviewed for symptom complexes similar to AHP, finding 4 patients where AHP diagnostic testing was likely indicated and 18 patients where AHP diagnostic testing was possibly indicated
From the top 100 ranked cases of patients with mention of porphyria in their record, 4 for whom AHP diagnostic testing was possibly indicated and had not been previously performed were identified
Based solely on the reported prevalence of AHP, the researchers would have expected only 0.002 cases out of the 200 patients manually reviewed
The overall cross-validation scores of the model on the data set using the known 30 AHP cases as the positive set and the rest of the data as negative training samples yielded an average area under the curve (AUC) of 0.775
The strongest positive predictors in the model included unexplained abdominal pain, pelvic and perineal pain, nausea and vomiting, and several pain and nausea medications
“This number of patients with indications for AHP diagnostic testing and possibly to-be confirmed diagnosis vastly exceeds that due to chance and surpassed our expectations,” the authors wrote. “It will require clinical follow-up to determine whether these patients’ symptoms are truly due to AHP or not, but the manual record review clearly demonstrates that our methodology has found patients for whom a spot urine porphobilinogen test is indicated.”
The use of more advanced features that represent time, duration, and intervals; explicit coding of symptom separation and overlap; and more sophisticated machine learning algorithms tailored to situations where the positive cases are extremely rare could all help improve the machine learning approach.
The investigators plan to continue this work into clinical settings, refine their methods using random forests and deep learning, and extend the methodology to other rare diseases.
“Analyzing the EHR with advanced techniques such as demonstrated here points to the potential of the future of digital medicine on a population scale,” they concluded.
“Advanced approaches enabled by the wide deployment of the EHR can now be used to improve medicine and medical care in areas that have been underserved or inaccessible. Health care can be made more proactive, not simply in terms of common conditions and age- or gender-related screening, but for rarer conditions as well.”
Cohen AM, Chamberlin S, Deloughery T, et al. Detecting rare disease in electronic health records using machine learning and knowledge engineering: case study of acute hepatic porphyria. PloS One. 2020;15(7):e0235574. doi:10.1371/journal.pone.0235574