In the study period, there were 1,485,880 hospitalizations for 708,089 unique patients, 439,696 (62%) of whom had only 1 hospitalization recorded. The median number of hospitalizations per patient was 1 (first and third quartile (QI) [1.0, 2.0]). There were 211,022 30-day readmissions for an overall readmission rate of 14%. Among patients aged ≥65 years, the 30-day readmission rate was 16%. The median LOS, including patients in observation status and labor and delivery patients, was 2.94 days (QI [1.67, 5.34]), or if these patients are excluded, 3.71 days (QI [2.15, 6.51]). The demographic and clinical characteristics of the patient cohort are summarized in Table 1. Higher rates of 30-day readmissions were observed in patients who were older (median age 62 vs. 59 years), African American (rate of 17% vs. 13% in whites), divorced/separated or widowed (17% vs. 13% in married/partnered or single patients), on Medicare insurance (rate of 17% vs. 10% for private insurance), and had one or multiple chronic conditions such as cancer, renal disease, congestive heart failure, chronic obstructive pulmonary disease, etc. (Table 1).
Prediction of inpatient outcomes
Thirty-day readmissions were predicted with an area under the receiver operator characteristic curve (ROC AUC, here abbreviated as simply “AUC”) of 0.76 (Supplementary Fig. 1a). The Brier score loss (BSL) was 0.11, calibration curve shown in Supplementary Fig. 1b. Average precision was 0.38 (see Supplementary Fig. 2c). Other off-the-shelf ML models, including a deep neural network, were trained on the same task, with performance generally inferior to the Gradient Boosting Machine (GBM), or in the case of the deep neural network, similar (see Supplementary Fig. 2 and Supplementary Table 1). When trained and evaluated on a smaller cohort of 300,000 hospitalizations, performance metrics were similar: AUC 0.75, BSL 0.11. The most impactful features included (ranked from the most to the least important): primary diagnosis, days between the current admission and the previous discharge, number of past admissions, LOS, total emergency department visits in the past 6 months, number of reported comorbidities, admission source, discharge disposition, and Body Mass Index (BMI) on admission and discharge, as well as others (Fig. 1a, b, see also Supplementary Fig. 3). Including more than the top ten variables in the model did not improve predictive power for the cohort overall but does allow for more specific rationale for prediction for certain patients, as well as examination of feature interactions for further exploration. Sample individualized predictions with their explanations are shown in Fig. 1c, d, and further examples are shown in Supplementary Fig. 4. The examples in Supplementary Fig. 4 show patients with comparable predicted probabilities but different compositions of features leading to these predictions.
In order to examine possible changes in causes of readmission risk as a function of time from discharge, we predicted readmission risk for several readmission thresholds and calculated SHAP (SHapley Additive exPlanation) for each. SHAP values for 3- and 7-day readmission are shown in Supplementary Fig. 5a, b, respectively. For example, 7-day readmission risk prediction achieved AUC of 0.70 with a BSL of 0.05 (Table 2). The most impactful feature remained primary diagnosis, but other features played more important roles—e.g., BlockGroup rose to second most important variable (from ninth), number of emergency department visits in the past 6 months rose to third importance from fourth, admission blood counts increased in importance, and insurance provider rose to eighth from twelfth. BMI on admission fell several places, and BMI on discharge no longer features in the top variables. The BMI variables are unique in that missing values tend to be important, in addition to extreme values, perhaps correlating with disease burden and/or hospital practices that could be further investigated.
LOS was predicted in terms of the number of days and was binarized at various thresholds. LOS in days was predicted poorly, within 3.97 days measured by root mean square error (RMSE; average LOS 2.94–3.71 days). LOS over 5 days was predicted with an AUC of 0.84 (Fig. 2a) and a BSL of 0.15 (calibration curve shown in Supplementary Fig. 1d). Average precision was 0.70 (see Supplementary Fig. 2d). When trained and evaluated on a cohort of 300,000 patients, performance was similar: AUC 0.81 and BSL 0.17. Other ML models, including a deep neural network, were trained on the same task, with performance generally inferior to the GBM (see Supplementary Fig. 2 and Supplementary Table 1). The most impactful features included the type of admission, primary diagnosis code, patient age, admission source, LOS of the most recent prior admission, medications administered in the hospital in the first 24 h, insurance, and early admission to the intensive care unit, among others shown in Fig. 2c, d. Impactful features for LOS at thresholds of 3 and 7 days are shown in Supplementary Fig. 5c, d, respectively. The AUC did not differ in these time points compared to 5 days (Table 2). Given that primary diagnosis is often assigned late in the hospital encounter or even after discharge, we trained the LOS models with and without this feature for comparison. Results are shown in Supplementary Table 1d. Overall, predictive performance was decreased, as expected. AUC for LOS > 5 days was 0.781, BSL was 0.173, and average precision was 0.640.
Prediction of death within 48–72 h of admission was predicted with an AUC of 0.91 and BSL of 0.001 (Table 2). However, owing to extreme class imbalance (e.g., in the testing set there were 260,518 non-deaths and 390 deaths), this was achieved by predicting non-death in every case. Strategies to produce a reliable model by addressing class imbalance, such as data oversampling, were unsuccessful. AUC and BSL do not reliably indicate model performance and applicability in this clinical setting.
SHAP analysis also allows examination of interactions between variables. Key variable interactions are shown in Supplementary Figs 6 and 7. For example, high and low values of heart rate were shown to affect probability of readmission differently for patients at different ages. With older patients, there is a clearer trend toward lower heart rates on discharge contributing to lower readmission risk and higher heart rates contributing to higher readmission risk, though modestly (SHAP values from −0.1 to +0.1–0.2). With younger patients, higher discharge heart rates overall are observed, and the positive trend is more modest. This may highlight the importance of considering a variable such as heart rate in a more complete clinical setting, such as one that includes patient age and clinical reasoning (e.g., an adult is unlikely to be discharged with marked tachycardia) (Supplementary Fig. 6c). A similar finding is observed in Supplementary Fig. 7c for LOS prediction, though clinical reasoning is less likely to play a role compared with more purely physiologic phenomena: higher heart rates overall are observed for pediatric patients, and the relationship between heart rate and LOS is not observed to be as linear for pediatric patients (high and low SHAP values are observed more uniformly for given levels of tachycardia in pediatric patients).