Patients and date collection
We retrospectively reviewed the medical records and endoscopic images of all patients diagnosed with HSV or CMV esophagitis between April 2008 and December 2016 at Asan Medical Center (Seoul, Korea). The diagnosis of HSV or CMV esophagitis was confirmed with clinical symptoms, endoscopic findings, and histopathologic review with IHC and/or PCR. Patients were excluded according to the following criteria: co-infection with HSV and CMV, final pathologic diagnosis of malignancy, recurrent infection, or missing information on endoscopic findings. The institutional review board of Asan Medical Center approved the study (IRB No. 2020-0495). Due to the retrospective study design, written informed consent was not obtained from participants. The IRB of our institution waived the need for informed consent based on the non-invasive and anonymized nature of this study. This study was conducted in accordance with institutional ethical guidelines and the Declaration of Helsinki.
Lesion segmentation and feature extraction
In order to extract imaging features to differentiate between the two types of esophagitis, one board-certified expert (more than 15 years of experience in endoscopy) reviewed the quality of the collected endoscopic images and manually annotated the regions of interest (ROIs). Cases of shaky images or lesions far away from the endoscope light source were excluded because the shapes of the lesions were not clearly visible. ROIs were drawn as close to the margins of the lesions as possible so as to not include the normal esophageal mucosa (Fig. 1).
The hue–saturation–brightness (HSB) color model was employed to extract image features from endoscopic color images. In color image processing, there are various color models designed for specific purposes, such as red–green–blue (RGB), cyan–magenta–yellow–black (CMYK), and HSB. The HSB color model, which was designed to approximate the way humans perceive and interpret color, is often used in computer vision for feature detection or image segmentation since it is a device-independent color representation format22. Our esophagitis classifier was compared with one based on the RGB color model, which is the most widely used. Since the characteristics of each ROI in the image are expected to be different, ROI-based classifiers were designed instead of image-based classifiers, and then image-based accuracy was obtained by averaging the results of the ROIs. We collected 1082 endoscopic images from 150 patients, obtaining a total of 3444 ROIs (HSV: 87 patients, 666 endoscopic images, 2628 ROIs; CMV: 63 patients, 416 endoscopic images, 816 ROIs).
There were 520 image features extracted from each channel of the HSB and RGB color models, resulting in a total of 1,560 image features extracted from each ROI, including first-order (N = 17), texture (N = 87) and wavelet analyses (N = 416) (Supplementary Appendix I). The first-order features were derived from intensity histograms using first-order statistics, including intensity range, energy, entropy, kurtosis/skewness, maximum/minimum, mean, median, uniformity, and variance. Texture features were obtained with a gray-level co-occurrence matrix (GLCM) and a gray-level run length matrix (GLRLM) in four directions in two-dimensional (2D) space23; GLCM texture features were computed for varying distances of 1, 2, and 3 pixels in four directions. The wavelet transformation was applied with a single-level directional discrete wavelet transformation of high-pass and low-pass filters24. In total, four wavelet-decomposition images were generated from each ROI: LL, LH, HL, and HH images, where ‘L’ means ‘low-pass filter’ and ‘H’ means ‘high-pass filter.’ Then, the first-order and texture features were applied to the wavelet-transformed images, yielding 416 wavelet features (17 first-order and 87 texture features per wavelet-transformed image). All image features were standardized by z-transformation before applying classification metrics.
Effective feature selection is a crucial step because image features are multiple collinear and correlated predictors that could produce unstable estimates and might overfit predictions. The feature selection methods can be divided by how they are coupled to the classification or learning algorithms as follows: (1) filter method, (2) wrapper method, (3) embedded method25. Filter methods reduce the number of features independently. Wrapper methods wrap the feature selection around the classification method and use the prediction accuracy of the model to iteratively select or eliminate a set of features. In embedded methods, the feature selection process is an integral part of the classification model. We made feature selection more efficient by combining the filter method (i.e., feature filtering using univariate feature selection) and the embedded method (i.e., LASSO). First, we filtered the extracted features using univariate feature selection in terms of each channel of the HSB and RGB color models. Based on the p value (< 0.05) of ANOVA tests, 124 features of HSB color models were filtered out, and the remaining features included 478 H-channel features, 481 S-channel features, and 477 B-channel features. For the RGB color model, 420 features were filtered out, and the remaining features included 341 R-channel features, 410 G-channel features, and 389 B-channel features. After channel-wise feature filtering, the remaining features were combined according to color model (HSB color model: 1436 features, RGB color model: 1140 features). A LASSO was then employed for feature selection of combined features. A total of 25 LASSOs were performed by five repeated five-fold cross-validations, and 11–18 features and 11–20 features were selected from the HSB and RGB color models, respectively (Supplementary Appendix II). Using selected image features, two different machine learning classifiers were trained: logistic regression and random forest. The random forest is a classifier that derives and ensembles several decision tree classifiers on various sub-samples of the dataset to improve the predictive accuracy and control overfitting. In other words, random forest does not require additional feature selection. However, we tried to improve the performance of random forest by combining LASSO since our dataset has many features compared with the number of datasets. While performing five repeated five-fold cross-validations, the hyperparameters of logistic regression and random forest were obtained by nested cross-validation in each fold. To maximize the probabilities of correct decisions, we found an optimal cutoff value using the true-positive and false-positive rates forming the receiver operating characteristic (ROC) curve26. Univariate feature selection, LASSO, logistic regression, and random forest classification were implemented using the Scikit-learn package (https://github.com/scikit-learn/scikit-learn)27.
Categorical data were analyzed using the chi-squared test or Fisher’s exact test as appropriate. Numerical data were analyzed using Student’s t-test. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and area under the curve (AUC) were calculated by standard definitions to evaluate the performance of the developed AI system. To evaluate the differences in performance between models, we performed the Wilcoxon signed-rank test19. All statistical analyses were performed using SPSS Statistics for Windows, version 18.0 (IBM; Armonk, NY). p values < 0.05 were considered statistically significant.