The unprecedented pace of the sequencing of the SARS-CoV-2 virus genomes provides us with unique information about the genetic changes in a single pathogen during ongoing pandemic. By the analysis of close to 200,000 genomes we show that the patterns of the SARS-CoV-2 virus mutations along its genome are closely correlated with the structural and functional features of the encoded proteins. Requirements of foldability of proteins’ 3D structures and the conservation of their key functional regions, such as protein-protein interaction interfaces, are the dominant factors driving evolutionary selection in protein-coding genes. At the same time, avoidance of the host immunity leads to the abundance of mutations in other regions, resulting in high variability of the missense mutation rate along the genome. “Unexplained” peaks and valleys in the mutation rate provide hints on function for yet uncharacterized genomic regions and specific protein structural and functional features they code for. Some of these observations have immediate practical implications for the selection of target regions for PCR-based COVID-19 tests and for evaluating the risk of mutations in epitopes targeted by specific antibodies and vaccine design strategies. Author summary RNA viruses, such as SARS-CoV-2 have high mutation rates and their genomes accumulate mutations at a pace much faster than larger organisms. While a lot of attention is focused on mutations changing the behavior of the virus, making it more or less infectious or virulent, most mutations appear to be neutral. The interplay between different types of natural selection and genetic drift is intensively studied by viral genetics, with many detailed models of viral evolution. Here we show, on the example of the SARS-CoV-2 virus, that the patterns of mutations in viral genomes are tightly coupled with the three-dimensional structure and detailed functional features of the proteins coded by the viral genome. Highly mutated regions of the genome correspond to structural regions that can easily accept amino acid changes, such as disordered regions or protein surfaces, while the reverse is true for regions corresponding to protein cores or functionally important features. While many patterns can be explained by what we already know about SARS-CoV-2 proteins, others provide hints for the still undiscovered functions or still unknown structural features. Taking into account these patterns may be important when we develop tools, such as antibodies, PCR probes, vaccines or drugs, to make sure we target genomic regions that are conserved because of natural negative selection.
【초록키워드】 antibodies, SARS-CoV-2, Vaccine, pandemic, Mutation, Vaccine design, Sequencing, Genome, Genetic, drugs, risk, virus, Protein, Region, PCR, Features, pathogen, viral evolution, RNA viruses, Viral genetics, natural selection, Missense mutation, information, epitope, 3D structure, protein-protein interaction, function, Analysis, amino acid changes, Neutral, SARS-CoV-2 proteins, viral genome, COVID-19 test, observation, Factor, Organisms, genomic region, three-dimensional structure, host immunity, avoidance, high mutation rate, probes, high variability, protein-coding genes, virulent, encoded proteins, genetic change, while, driving, dominant, regions, implication, feature, resulting, develop, conserved, example, specific antibody, functional, provide, less, explained, in viral, unique, faster, correlated, mutated, accumulate, coded, Taking, target region, Requirement, the SARS-CoV-2 virus, 【제목키워드】 Protein, SARS-CoV-2 evolution,