Trio exome sequencing has been successful in identifying genes with de novo mutations (DNMs) causing epileptic encephalopathy (EE) and other neurodevelopmental disorders. Here, we evaluate how well a case-control collapsing analysis recovers genes causing dominant forms of EE originally implicated by DNM analysis. We performed a genome-wide search for an enrichment of “qualifying variants” in protein-coding genes in 488 unrelated cases compared to 12,151 unrelated controls. These “qualifying variants” were selected to be extremely rare variants predicted to functionally impact the protein to enrich for likely pathogenic variants. Despite modest sample size, three known EE genes ( KCNT1 , SCN2A , and STXBP1 ) achieved genome-wide significance (p<2.68×10 −6 ). In addition, six of the 10 most significantly associated genes are known EE genes, and the majority of the known EE genes (17 out of 25) originally implicated in trio sequencing are nominally significant (p<0.05), a proportion significantly higher than the expected (Fisher’s exact p = 2.33×10 −17 ). Our results indicate that a case-control collapsing analysis can identify several of the EE genes originally implicated in trio sequencing studies, and clearly show that additional genes would be implicated with larger sample sizes. The case-control analysis not only makes discovery easier and more economical in early onset disorders, particularly when large cohorts are available, but also supports the use of this approach to identify genes in diseases that present later in life when parents are not readily available. Author summary Trio exome sequencing and de novo mutation (DNM) analysis has been the main approach to discovering genes responsible for severe sporadic disorders, including a range of neurodevelopmental disorders. This approach requires sequencing parents, identifying DNMs from trio sequence data, and comparing the observed rate of DNMs to the expected. In this study, we adopted a case-control design, performed a gene-based collapsing analysis, and rediscovered several of the epileptic encephalopathy (EE) genes originally implicated by DNM analysis of EE trios. Our collapsing analysis focused on ultra-rare, highly impactful variants (“qualifying variants”) by filtering against large-scale population datasets, and this approach revealed that most of the standing variation can be filtered out and DNMs are enriched in “qualifying variants”. Our study suggests that a case-control analysis approach can be used to identify disease genes with causal mutations that are predominantly de novo in place of trio-based analysis methods. This offers an efficient and cost effective alternative approach when large-scale trio sequencing is not possible.
【초록키워드】 Mutation, Variation, Sequencing, variant, variants, Protein, Case-control, large cohort, disease, Analysis, Collapsing, life, Support, parents, Sample size, Exome, disorders, enrichment, sequence, protein-coding gene, pathogenic, parent, sample sizes, datasets, de novo mutations, Trio, protein-coding genes, genome-wide significance, de novo, de novo mutation, epileptic encephalopathy, KCNT1, SCN2A, STXBP1, offer, dominant, Genes, approach, controls, effective, selected, responsible, predicted, identify, performed, evaluate, significantly, proportion, addition, form, can be used, majority, adopted, significantly higher, expected, implicated, rare variant, filtered, 【제목키워드】 Mutation, epilepsy, Case-control, Analysis, Collapsing, de novo, identify, implicated,