The SARS-CoV-2 pandemic has raised concerns in the identification of the hosts of the virus since the early stages of the outbreak. To address this problem, we proposed a deep learning method, DeepHoF, based on extracting viral genomic features automatically, to predict the host likelihood scores on five host types, including plant, germ, invertebrate, non-human vertebrate and human, for novel viruses. DeepHoF made up for the lack of an accurate tool, reaching a satisfactory AUC of 0.975 in the five-classification, and could make a reliable prediction for the novel viruses without close neighbors in phylogeny. Additionally, to fill the gap in the efficient inference of host species for SARS-CoV-2 using existing tools, we conducted a deep analysis on the host likelihood profile calculated by DeepHoF. Using the isolates sequenced in the earliest stage of the COVID-19 pandemic, we inferred that minks, bats, dogs and cats were potential hosts of SARS-CoV-2, while minks might be one of the most noteworthy hosts. Several genes of SARS-CoV-2 demonstrated their significance in determining the host range. Furthermore, a large-scale genome analysis, based on DeepHoF’s computation for the later pandemic in 2020, disclosed the uniformity of host range among SARS-CoV-2 samples and the strong association of SARS-CoV-2 between humans and minks.
【저자키워드】 Virology, Computational biology and bioinformatics, Computational models, 【초록키워드】 SARS-CoV-2, pandemic, SARS-CoV-2 pandemic, COVID-19 pandemic, Human, Genome, virus, outbreak, Phylogeny, host range, genomic, early stage, predict, bats, association, Novel viruses, Analysis, AUC, computation, hosts, Deep Learning Method, Host, isolate, FIVE, feature, likelihood, lack, novel virus, sequenced, raised, conducted, calculated, demonstrated, automatically, 【제목키워드】 SARS-CoV-2, pandemic, Host,