Plain Language Summary Viral mutations can potentially carry a transient advantage, being simultaneously favourable for replication within hosts (e.g. by evading host immune responses) and deleterious to transmission (e.g. by having reduced cell binding). To identify such mutations, called transmission fitness polymorphisms (TFPs), we developed a clustering algorithm entitled mlscluster that computes clade-level statistics based on the number of descendants, persistence times, and growth rates of clades carrying a specific mutation in comparison with their immediate sisters without the mutation, which usually are different than expected in the presence of such TFPs. We then applied it to a representative SARS-CoV-2 time-scaled tree with >1 million whole-genome sequences from England. Our statistical analysis suggested approximately constant levels of transient selection across waves driven by very distinct variants. It also showed that genomic regions of known functional significance such as spike, nucleocapsid, and ORF3a were enriched for TFPs. This is the one of the first studies to characterise SARS-CoV-2 recurrent mutations potentially under multilevel selection, providing empirical evidence for the existence of important tradeoffs in selection between intrahost replication and inter-host transmission. Therefore, it provides target mutations for realistic coalescent-based modelling and laboratory-based investigations of their impacts and mechanisms of interaction with human cells.
【저자키워드】 SARS-CoV-2, Mutation, molecular evolution, Phylogenetic analysis, natural selection, transmission fitness, genetic clustering, within-host evolution,