Summary Standard transcriptomic analyses alone have limited power in capturing the molecular mechanisms driving disease pathophysiology and outcomes. To overcome this, unsupervised network analyses are used to identify clusters of genes that can be associated with distinct molecular mechanisms and outcomes for a disease. In this study, we developed an integrated network analysis framework that integrates transcriptional signatures from multiple model systems with protein-protein interaction data to find gene modules. Through a meta-analysis of different enriched features from these gene modules, we extract communities of highly interconnected features. These clusters of higher-order features, working as a multifeatured machine, enable collective assessment of their contribution for disease or phenotype characterization. We show the utility of this workflow using transcriptomics data from three different models of SARS-CoV-2 infection and identify several pathways and biological processes that could enable understanding or hypothesizing molecular signatures inducing pathophysiological changes, risks, or sequelae of COVID-19. Graphical abstract Highlights • Defined a consensus gene signature across three models of SARS-CoV-2 infection • Characterized subnetworks of host proteins interacting with SARS-CoV-2 proteome • Integrated a wide range of COVID-19 and related data to build functional modules • Identified gene functional modules that can further the understanding of COVID-19 The bigger picture This study is based on the premise that combining information from multiple layers of data can result in new biologically interpretable associations in several ways. The underlying and unifying theme of this study is data integration, data mining, and meta-analysis for pattern detection that supports knowledge discovery and generation of hypotheses. The methods and the workflow used are disease agnostic and can be applied to any disease or phenotype that has multiple models and heterogeneous data elements. By integrating and joint analysis of several heterogeneous data types (multiple disease models, viral-host protein interaction data, single-cell RNA-sequencing data, protein-protein interactions, and genome-wide association study data), gene functional modules are identified that can have direct bearing on furthering the understanding of COVID-19. We report a data-driven, network-based workflow to identify gene and functional modules in COVID-19 through joint analysis of gene expression data from three model systems of SARS-CoV-2 infection. Bringing together a consensus gene expression signature from these model systems and analyzing it jointly with other omics data, we build clusters of higher-order multifeature machines that provide a basis for addressing several basic and translational research questions and generation of hypotheses.
【저자키워드】 COVID-19, Meta-analysis, SARS-CoV-2, coronavirus, Data integration, Data mining, network analysis, pattern search, module detection, 【초록키워드】 Gene Expression, knowledge, SARS-COV-2 infection, transcriptomics, outcome, molecular mechanism, outcomes, Features, pathophysiology, Genome-wide association study, pathway, Community, Cluster, phenotype, molecular, utility, information, disease, protein-protein interaction, interactions, association, Interaction, Analysis, Standard, Support, changes, elements, Consensus, Abstract, disease models, host protein, SARS-CoV-2 proteome, heterogeneous, hypotheses, gene expression signature, Single-cell RNA-sequencing, driving, gene signature, Research question, transcriptomic, data-driven, feature, model system, transcriptional, joint, identify, functional, overcome, translational, biological processe, Identified, build, multiple model system, pathophysiological, viral-host protein, 【제목키워드】 Transcriptome, SARS-COV-2 infection, Analysis, secondary,