Abstract Background The current coronavirus disease 2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome (SARS)-CoV-2, has become the most devastating public health emergency in the 21st century and one of the most influential plagues in history. Studies on the origin of SARS-CoV-2 have generally agreed that the virus probably comes from bat, closely related to a bat CoV named BCoV-RaTG13 taken from horseshoe bat ( Rhinolophus affinis ), with Malayan pangolin ( Manis javanica ) being a plausible intermediate host. However, due to the relatively low number of SARS-CoV-2-related strains available in public domain, the evolutionary history remains unclear. Methodology Nine hundred ninety-five coronavirus sequences from NCBI Genbank and GISAID were obtained and multiple sequence alignment was carried out to categorize SARS-CoV-2 related groups. Spike sequences were analyzed using similarity analysis and conservation analyses. Mutation analysis was used to identify variations within receptor-binding domain (RBD) in spike for SARS-CoV-2-related strains. Results We identified a family of SARS-CoV-2-related strains, including the closest relatives, bat CoV RaTG13 and pangolin CoV strains. Sequence similarity analysis and conservation analysis on spike sequence identified that N-terminal domain, RBD and S2 subunit display different degrees of conservation with several coronavirus strains. Mutation analysis on contact sites in SARS-CoV-2 RBD reveals that human-susceptibility probably emerges in pangolin. Conclusion and implication We conclude that the spike sequence of SARS-CoV-2 is the result of multiple recombination events during its transmission from bat to human, and we propose a framework of evolutionary history that resolve the relationship of BCoV-RaTG13 and pangolin coronaviruses with SARS-CoV-2. Lay Summary This study analyses whole-genome and spike sequences of coronavirus from NCBI using phylogenetic and conservation analyses to reconstruct the evolutionary history of severe acute respiratory syndrome (SARS)-CoV-2 and proposes an evolutionary history of spike in the progenitors of SARS-CoV-2 from bat to human through mammal hosts before they recombine into the current form.
【저자키워드】 SARS-CoV-2, Receptor-binding domain, bat coronavirus RaTG13, pangolin coronavirus, conservation analysis, 【초록키워드】 COVID-19, coronavirus disease, coronavirus, pandemic, spike, Variation, Transmission, virus, RBD, RaTG13, CoV, public health emergency, GISAID, Strains, S2 subunit, N-terminal domain, Multiple sequence alignment, Analysis, strain, Contact, similarity, Phylogenetic, SARS-CoV-2 RBD, plague, acute respiratory syndrome, domain, sequence, NCBI, coronavirus strains, recombination event, pangolin CoV, whole-genome, Host, Result, analyzed, identify, was used, caused, carried, coronavirus, analysis, reveal, groups, analyses, recombine, 【제목키워드】 conservation, SARS-CoV-2 spike, Analysis, viral adaptation,