Artificial intelligence for the diagnosis of heart failure

Development of cardiovascular AI-CDSS

Using the training dataset of 600 patients with and without HF, the AI-CDSS was created using predefined steps including expert-driven knowledge acquisition, machine learning (ML)-driven rule generation, and hybridization of both types of knowledge.

Expert-driven knowledge acquisition

In the knowledge modeling phase, the clinical recommendations of diagnosis were first transformed into mind maps and then transformed to a decision tree. The decision tree was evaluated and modified by the physicians until a consensus was achieved. The final decision tree was termed as R-CKM (Supplementary Fig. 1) and included 14 contributing factors (Supplementary Table 1) and 4 possible outcomes: HFrEF, HFmrEF, HFpEF, and no-HF.

ML-driven rule generation

We used five machine learning algorithms i.e. Decision Tree (DT), Random Forest, Chi-squared Automatic Interaction Detection (CHAID), J48, and Classification and Regression Tree (CART). All algorithms selected only few features such as left ventricular ejection fraction (LVEF), left atrial volume index (LAVI), and left ventricular mass index (LVMI) as highly contributing factors (Supplementary Table 2). To boost the model performance, the auto-feature selection method was used and LVEF, electrocardiography, LVMI, and LAVI were selected as the most significant features (Supplementary Fig. 2).

The five algorithms showed different accuracy (Supplementary Table 3). We also calculated the rank of each algorithm based on the accuracy, number of rules extracted, and number of attributes involved, using the rank formula developed in our previous work¹⁰. Finally, CART algorithm was selected to create ML-driven knowledge, because it showed the highest accuracy and rank of 88.5 and 0.5736, respectively. The CART algorithm mainly focused on features of LVEF, LAVI, and tricuspid regurgitation velocity (Supplementary Fig. 3). The algorithm correctly predicted HFmrEF and HFrEF with 100% accuracy, whereas HFpEF and no-HF were predicted with 78.9% and 80.5% accuracy, respectively.

Hybrid knowledge

The merging of the CKM from the expert-driven knowledge and the PM from the ML-driven knowledge approach led to the final hybrid knowledge in form of R-CKM (Fig. 1, Supplementary Materials). Sometimes, physician may miss some of the attributes or path of attributes during development of CKM, and the ML generated PM finds the missing attributes or paths. For instance, the CKM is starting with the “Sign & Symptoms” as shown in (Supplementary Fig. 3), while the PM starts checking from “LVEF” as shown in (Supplementary Fig. 4). Therefore, the hybridization algorithm recognizes that the CKM is missing a path of “Not Available” values between “Sign & Symptoms” and “LVEF” attributes. When we added this new path into CKM, the number of knowledge base rules increased drastically. The addition of new path into R-CKM increases the coverage of patient cases to generate right recommendations and increase the accuracy.

Fig. 1: Comparison of existing CDSSs and our proposed artificial intelligence-CDSS.

CDSS Clinical Decision Support System, CKM clinical knowledge model, I-KAT Intelligent Knowledge Authoring Tool, NCCN National Comprehensive Cancer Network, NICE National Institute for Health and Care Excellence, PM prediction model.

Full size image

Validation of AI-CDSS

Study population

The test dataset included 598 patients (490 patients with HF, 108 patients without HF). Patients with HF were older (73.1 ± 13.8 years vs. 64.8 ± 13.8 years, P < 0.001), more likely to be male (52% vs. 37%, P = 0.005), and had higher N-terminal pro-brain natriuretic peptide levels (10,075 ± 11,778 pg/L vs. 82 ± 68 pg/L, P < 0.001). Concerning the echocardiographic parameters, patients with HF had lower LVEF (45.5 ± 17.4% vs. 64.1 ± 6.5%, P < 0.001), higher LAVI (53.9 ± 21.1 ml/m² vs. 31.2 ± 8.5 ml/m2, P < 0.001), and higher E/e′ (18.6 ± 9.8 vs. 9.8 ± 3.5, P < 0.001) (Table 1). Among patients with HF, 199 (40.6%), 63 (12.9%), and 228 (46.5%) were classified as having HFrEF, HFmrEF, and HFpEF, respectively.

Table 1 Characteristics of the study population retrospective patients (n = 598).

Full size table

Diagnostic accuracy

The results of comparative analysis are shown in Fig. 2. The concordance rate was 100% in HFrEF and HFmrEF for all three approaches. With respect to HFpEF, the concordance rate was 82%, 79%, and 99.5% for expert-driven, ML-driven, and hybrid CDSS, respectively. Similar findings were observed for no-HF. The overall diagnostic accuracy was 90%, 88.5%, and 98.3% for expert-driven, ML-driven, and hybrid CDSS, respectively, showing a remarkable increase in accuracy by 8% with the hybrid approach, i.e., AI-CDSS.

The expert-driven approach had a sensitivity and a specificity of 0.96 and 0.71, respectively (Supplementary Table 4), whereas the ML-driven approach had a sensitivity and a specificity of 0.72 and 0.94, respectively (Supplementary Table 5). Strikingly, the hybrid approach had a sensitivity and specificity of 0.94 and 0.99, respectively (Supplementary Table 6).

Subgroup analysis

We divided the patients according to echocardiographic parameters. Set A included all echocardiography parameters, whereas set B included only LVEF, LAVI, and LVMI. The concordance rate was lower in set B than in set A (Supplementary Fig. 4). In our study, the age of the included patients ranged from 20 to 92 years. Age did not affect the accuracy of the system (Supplementary Table 7).

Accuracy of AI-CDSS in a prospective cohort of patients with dyspnea

A total of 100 consecutive patients who presented with dyspnea to the outpatient clinic were enrolled. Of these, the data of three patients were not complete; thus, the data of 97 patients were used in the final analysis. Of the 97 patients, 43 (44%) had HF. In this prospective cohort, the concordance rate of the non-HF specialists was 76%, whereas that of AI-CDSS was 98% (Fig. 3). Especially, the diagnosis of HFmrEF and HFpEF was low among the non-HF specialist, whereas the diagnosis of no-HF was comparably high.