Abstract
Explosively emerging SARS-CoV-2 variants challenge current nomenclature schemes based on genetic diversity and biological significance. Genomic composition-based machine learning methods have recently performed well in identifying phenotype-genotype relationships. We introduced a framework involving dinucleotide (DNT) composition representation (DCR) to parse the general human adaptation of RNA viruses and applied a three-dimensional convolutional neural network (3D CNN) analysis to learn the human adaptation of other existing coronaviruses (CoVs) and predict the adaptation of SARS-CoV-2 variants of concern (VOCs). A markedly separable, linear DCR distribution was observed in two major genes-receptor-binding glycoprotein and RNA-dependent RNA polymerase (RdRp)-of six families of single-stranded (ssRNA) viruses. Additionally, there was a general host-specific distribution of both the spike proteins and RdRps of CoVs. The 3D CNN based on spike DCR predicted a dominant type II adaptation of most Beta, Delta and Omicron VOCs, with high transmissibility and low pathogenicity. Type I adaptation with opposite transmissibility and pathogenicity was predicted for SARS-CoV-2 Alpha VOCs (77%) and Kappa variants of interest (58%). The identified adaptive determinants included D1118H and A570D mutations and local DNTs. Thus, the 3D CNN model based on DCR features predicts SARS-CoV-2, a major type II human adaptation and is qualified to predict variant adaptation in real time, facilitating the risk-assessment of emerging SARS-CoV-2 variants and COVID-19 control.
Keywords: 3D convolutional neural networks; SARS-CoV-2; dinucleotide composition representation; human adaptation; variants of concern.
【저자키워드】 SARS-CoV-2, variants of concern., 3D convolutional neural networks, dinucleotide composition representation, human adaptation, 【초록키워드】 COVID-19, viruses, coronavirus, Mutation, adaptive, variant, SARS-CoV-2 variant, variants of concern, Delta, Local, omicron, variants, CNN, Transmissibility, SARS-CoV-2 variants, RNA viruses, VOCs, Convolutional neural network, RdRP, RNA-dependent RNA polymerase, variants of interest, Beta, genetic diversity, glycoprotein, RNA virus, adaptation, pathogenicity, distribution, predict, Analysis, Spike proteins, Alpha VOC, real time, framework, determinant, CoVs, biological significance, ssRNA, A570D, D1118H, dinucleotide, opposite, dominant, feature, RdRps, single-stranded, predicted, performed, linear, applied, introduced, the spike protein, Type, DCR, DNT, 【제목키워드】 Genome, SARS-CoV-2 variant, predict, deep,