Summary Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics. Highlights • KG-COVID-19 is a framework for producing customized COVID-19 knowledge graphs • Our knowledge graph and framework is free, open-source, and FAIR • KG-COVID-19 integrates a wide range of COVID-19-related data in an ontology-aware way • Our KG has been applied to use cases including ML tasks, hypothesis-based querying The Bigger Picture An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships. An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships.
【저자키워드】 DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems, 【초록키워드】 COVID-19, SARS-CoV-2, SARS-CoV, knowledge, COVID-19 pandemic, MERS-CoV, database, Research, Pandemics, Community, Biomedical research, Clinicians, effective response, related viruses, COVID-19 knowledge, problem, heterogeneous, while, downstream, researcher, flexible, example, required, applied, producing, related virus, time-consuming, 【제목키워드】 COVID-19, response, Knowledge graph, framework, Produce,