Summary
Deep learning (DL) models typically require large-scale, balanced training data to be robust, generalizable, and effective in the context of healthcare. This has been a major issue for developing DL models for the coronavirus disease 2019 (COVID-19) pandemic, where data are highly class imbalanced. Conventional approaches in DL use cross-entropy loss (CEL), which often suffers from poor margin classification. We show that contrastive loss (CL) improves the performance of CEL, especially in imbalanced electronic health records (EHR) data for COVID-19 analyses. We use a diverse EHR dataset to predict three outcomes: mortality, intubation, and intensive care unit (ICU) transfer in hospitalized COVID-19 patients over multiple time windows. To compare the performance of CEL and CL, models are tested on the full dataset and a restricted dataset. CL models consistently outperform CEL models, with differences ranging from 0.04 to 0.15 for area under the precision and recall curve (AUPRC) and 0.05 to 0.1 for area under the receiver-operating characteristic curve (AUROC).
【저자키워드】 DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems, 【초록키워드】 COVID-19, coronavirus disease, pandemic, Mortality, intensive care, intubation, ICU, Electronic health record, healthcare, dataset, recall, characteristic, predict, margin, hospitalized COVID-19 patient, EHR, Precision, transfer, AUROC, training data, approach, effective, deep, robust, IMPROVE, tested, analyses, CEL, Conventional, 【제목키워드】 Critical, COVID-19 patient, IMPROVE,