Identification of genomic regions that are identical by descent (IBD) has proven useful for human genetic studies where analyses have led to the discovery of familial relatedness and fine-mapping of disease critical regions. Unfortunately however, IBD analyses have been underutilized in analysis of other organisms, including human pathogens. This is in part due to the lack of statistical methodologies for non-diploid genomes in addition to the added complexity of multiclonal infections. As such, we have developed an IBD methodology, called isoRelate, for analysis of haploid recombining microorganisms in the presence of multiclonal infections. Using the inferred IBD status at genomic locations, we have also developed a novel statistic for identifying loci under positive selection and propose relatedness networks as a means of exploring shared haplotypes within populations. We evaluate the performance of our methodologies for detecting IBD and selection, including comparisons with existing tools, then perform an exploratory analysis of whole genome sequencing data from a global Plasmodium falciparum dataset of more than 2500 genomes. This analysis identifies Southeast Asia as having many highly related isolates, possibly as a result of both reduced transmission from intensified control efforts and population bottlenecks following the emergence of antimalarial drug resistance. Many signals of selection are also identified, most of which overlap genes that are known to be associated with drug resistance, in addition to two novel signals observed in multiple countries that have yet to be explored in detail. Additionally, we investigate relatedness networks over the selected loci and determine that one of these sweeps has spread between continents while the other has arisen independently in different countries. IBD analysis of microorganisms using isoRelate can be used for exploring population structure, positive selection and haplotype distributions, and will be a valuable tool for monitoring disease control and elimination efforts of many diseases. Author summary There are growing concerns over the emergence of antimicrobial drug resistance, which threatens the efficacy of treatments for infectious diseases such as malaria. As such, it is important to understand the dynamics of resistance by investigating population structure, natural selection and disease transmission in microorganisms. The study of disease dynamics has been hampered by the lack of suitable statistical models for analysis of isolates containing multiple infections. We introduce a statistical model that uses population genomic data to identify genomic regions (loci) that are inherited from a common ancestor, in the presence of multiple infections. We demonstrate its potential for biological discovery using a global Plasmodium falciparum dataset. We identify low genetic diversity in isolates from Southeast Asia, possibly from clonal expansion following intensified control efforts after the emergence of artemisinin resistance. We also identify loci under positive selection, most of which contain genes that have been associated with antimalarial drug resistance. We discover two loci under strong selection in multiple countries throughout Southeast Asia and Africa where the selection pressure is currently unknown. We find that the selection pressure at one of these loci has originated from gene flow, while the other loci has originated from multiple independent events.
【초록키워드】 Treatment, Efficacy, Infectious diseases, Diseases, Positive selection, Genome, Transmission, Infectious disease, antimalarial, Artemisinin, malaria, Asia, Spread, Whole genome sequencing, infections, disease control, comparison, natural selection, selection pressure, statistical model, microorganisms, Pathogens, dataset, genetic diversity, genomes, drug resistance, methodology, genomic, disease, Critical, IBD, Haplotype, disease transmission, microorganism, Analysis, isolates, Plasmodium falciparum, identification, gene flow, antimalarial drug, population bottleneck, population bottlenecks, multiple infections, clonal expansion, critical regions, Organisms, genomic region, overlap, Exploratory analysis, genomic data, Comparisons, effort, sequencing data, loci, Plasmodium, whole genome, genetic study, descent, statistical methodologies, distributions, regions, isolate, populations, country, independent, selected, identify, lack, evaluate, addition, events, reduced, can be used, added, determine, analysis, statistical methodology, 【제목키워드】 pathogen, analysis,