Machine learning of COVID-19 clinical data identifies population structures with therapeutic potential

Summary
Clinical outcomes for patients with COVID-19 are heterogeneous and there is interest in defining subgroups for prognostic modeling and development of treatment algorithms. We obtained 28 demographic and laboratory variables in patients admitted to hospital with COVID-19. These comprised a training cohort (n = 6099) and two validation cohorts during the first and second waves of the pandemic (n = 996; n = 1011). Uniform manifold approximation and projection (UMAP) dimension reduction and Gaussian mixture model (GMM) analysis was used to define patient clusters. 29 clusters were defined in the training cohort and associated with markedly different mortality rates, which were predictive within confirmation datasets. Deconvolution of clinical features within clusters identified unexpected relationships between variables. Integration of large datasets using UMAP-assisted clustering can therefore identify patient subgroups with prognostic information and uncovers unexpected interactions between clinical variables. This application of machine learning represents a powerful approach for delineating disease pathogenesis and potential therapeutic interventions.

All Keywords
【저자키워드】 bioinformatics, Medical informatics, Viral microbiology, 【초록키워드】 Treatment, pandemic, hospital, outcome, Laboratory, Cohort, clinical, Patient, Clustering, Clusters, Cluster, second wave, dataset, prognostic, information, mortality rates, clinical feature, Interaction, Analysis, integration, UMAP, Predictive, reduction, subgroup, disease pathogenesis, GMM, Algorithms, clinical variables, therapeutic interventions, validation cohort, heterogeneous, datasets, variable, approach, patient subgroup, defined, identify, was used, to define, patients with COVID-19, Uniform, variables, with COVID-19, 【제목키워드】 COVID-19, Structure, Clinical data, therapeutic potential, machine, identify,