Learning From Past Respiratory Infections to Predict COVID-19 Outcomes: Retrospective Study

Background For the clinical care of patients with well-established diseases, randomized trials, literature, and research are supplemented with clinical judgment to understand disease prognosis and inform treatment choices. In the void created by a lack of clinical experience with COVID-19, artificial intelligence (AI) may be an important tool to bolster clinical judgment and decision making. However, a lack of clinical data restricts the design and development of such AI tools, particularly in preparation for an impending crisis or pandemic. Objective This study aimed to develop and test the feasibility of a “patients-like-me” framework to predict the deterioration of patients with COVID-19 using a retrospective cohort of patients with similar respiratory diseases. Methods Our framework used COVID-19–like cohorts to design and train AI models that were then validated on the COVID-19 population. The COVID-19–like cohorts included patients diagnosed with bacterial pneumonia, viral pneumonia, unspecified pneumonia, influenza, and acute respiratory distress syndrome (ARDS) at an academic medical center from 2008 to 2019. In total, 15 training cohorts were created using different combinations of the COVID-19–like cohorts with the ARDS cohort for exploratory purposes. In this study, two machine learning models were developed: one to predict invasive mechanical ventilation (IMV) within 48 hours for each hospitalized day, and one to predict all-cause mortality at the time of admission. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value, and negative predictive value. We established model interpretability by calculating SHapley Additive exPlanations (SHAP) scores to identify important features. Results Compared to the COVID-19–like cohorts (n=16,509), the patients hospitalized with COVID-19 (n=159) were significantly younger, with a higher proportion of patients of Hispanic ethnicity, a lower proportion of patients with smoking history, and fewer patients with comorbidities ( P <.001). Patients with COVID-19 had a lower IMV rate (15.1 versus 23.2, P =.02) and shorter time to IMV (2.9 versus 4.1 days, P <.001) compared to the COVID-19–like patients. In the COVID-19–like training data, the top models achieved excellent performance (AUROC>0.90). Validating in the COVID-19 cohort, the top-performing model for predicting IMV was the XGBoost model (AUROC=0.826) trained on the viral pneumonia cohort. Similarly, the XGBoost model trained on all 4 COVID-19–like cohorts without ARDS achieved the best performance (AUROC=0.928) in predicting mortality. Important predictors included demographic information (age), vital signs (oxygen saturation), and laboratory values (white blood cell count, cardiac troponin, albumin, etc). Our models had class imbalance, which resulted in high negative predictive values and low positive predictive values. Conclusions We provided a feasible framework for modeling patient deterioration using existing data and AI technology to address data limitations during the onset of a novel, rapidly changing pandemic.

All Keywords
【저자키워드】 COVID-19, feasibility, artificial intelligence, machine learning, Infection, outcome, respiratory, data, Invasive mechanical ventilation, framework, all-cause mortality, 【초록키워드】 Treatment, ARDS, pandemic, Diseases, Hospitalized, Mortality, Prognosis, Pneumonia, Influenza, smoking, respiratory diseases, Laboratory, cardiac troponin, Viral pneumonia, oxygen saturation, sensitivity, specificity, Cohort, Deterioration, Positive predictive value, Features, White blood cell, Research, Patient, Model, albumin, age, vital sign, predictor, clinical care, exploratory, information, characteristic, disease, Admission, patients, predict, Bacterial, acute respiratory distress, Combination, Negative predictive value, randomized trials, Clinical data, syndrome, limitation, AUROC, clinical experience, retrospective cohort, positive predictive values, Hispanic ethnicity, training data, COVID-19 cohort, objective, IMV, Additive, Result, identify, lack, develop, significantly, proportion, the patient, diagnosed, provided, restrict, feasible, 48 hour, machine learning model, patients with comorbidity, patients with COVID-19, with COVID-19, 【제목키워드】 learning,