Expanding Our Understanding of COVID-19 from Biomedical Literature Using Word Embedding

A better understanding of the clinical characteristics of coronavirus disease 2019 (COVID-19) is urgently required to address this health crisis. Numerous researchers and pharmaceutical companies are working on developing vaccines and treatments; however, a clear solution has yet to be found. The current study proposes the use of artificial intelligence methods to comprehend biomedical knowledge and infer the characteristics of COVID-19. A biomedical knowledge base was established via FastText, a word embedding technique, using PubMed literature from the past decade. Subsequently, a new knowledge base was created using recently published COVID-19 articles. Using this newly constructed knowledge base from the word embedding model, a list of anti-infective drugs and proteins of either human or coronavirus origin were inferred to be related, because they are located close to COVID-19 on the knowledge base. This study attempted to form a method to quickly infer related information about COVID-19 using the existing knowledge base, before sufficient knowledge about COVID-19 is accumulated. With COVID-19 not completely overcome, machine learning-based research in the PubMed literature will provide a broad guideline for researchers and pharmaceutical companies working on treatments for COVID-19.

All Keywords
【저자키워드】 COVID-19, Drug repurposing, machine learning, Medical Subject Headings, word embedding, PubMed literature, substance name, 【초록키워드】 coronavirus disease, Vaccine, coronavirus, Clinical characteristics, knowledge, Protein, Health, Characteristics, Research, information, researcher, articles, required, overcome, anti-infective drug, Numerous, accumulated, treatments for COVID-19, 【제목키워드】 understanding, literature, embedding,