Genomic mutations and changes in protein secondary structure and solvent accessibility of SARS-CoV-2 (COVID-19 virus)

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly pathogenic virus that has caused the global COVID-19 pandemic. Tracing the evolution and transmission of the virus is crucial to respond to and control the pandemic through appropriate intervention strategies. This paper reports and analyses genomic mutations in the coding regions of SARS-CoV-2 and their probable protein secondary structure and solvent accessibility changes, which are predicted using deep learning models. Prediction results suggest that mutation D614G in the virus spike protein, which has attracted much attention from researchers, is unlikely to make changes in protein secondary structure and relative solvent accessibility. Based on 6324 viral genome sequences, we create a spreadsheet dataset of point mutations that can facilitate the investigation of SARS-CoV-2 in many perspectives, especially in tracing the evolution and worldwide spread of the virus. Our analysis results also show that coding genes E, M, ORF6, ORF7a, ORF7b and ORF10 are most stable, potentially suitable to be targeted for vaccine and drug development.

All Keywords
【저자키워드】 Evolution, Data mining, 【초록키워드】 severe acute respiratory syndrome coronavirus 2, SARS-CoV-2, Vaccine, coronavirus, pandemic, Mutation, deep learning, COVID-19 pandemic, prediction, Intervention, Transmission, severe acute respiratory syndrome Coronavirus, virus, Spike protein, Spread, Protein, Viral, D614G, point mutations, dataset, ORF6, respiratory, tracing, genomic, ORF10, ORF7a, ORF7b, Analysis, Point mutation, secondary structure, genomic mutations, changes, acute respiratory syndrome, acute respiratory syndrome coronavirus, coding region, solvent, coding genes, coding regions, mutation D614G, viral genome sequences, virus spike protein, highly pathogenic, researchers, predicted, caused, unlikely, facilitate, changes in, respond, coding gene, 【제목키워드】 SARS-CoV-2, Protein, COVID-19 virus, secondary structure, solvent, genomic mutation, changes in,

중증급성호흡기증후군 코로나바이러스 2(SARS-CoV-2)는 전 세계적으로 COVID-19 범유행을 일으킨 고병원성 바이러스입니다. 바이러스의 진화와 전파를 추적하는 것은 적절한 개입 전략을 통해 전염병에 대응하고 통제하는 데 중요합니다. 본 논문은 딥러닝 모델을 이용하여 예측되는 SARS-CoV-2 코딩 영역의 게놈 돌연변이와 그 가능성이 있는 단백질 2차 구조 및 용매 접근성 변화를 보고하고 분석합니다. 예측 결과, 연구자들의 많은 관심을 받고 있는 바이러스 스파이크 단백질의 D614G 돌연변이가 단백질 2차 구조와 상대적인 용매 접근성에 변화를 줄 가능성은 낮다는 점을 시사한다. 6324개의 바이러스 게놈 서열을 기반으로, 우리는 특히 바이러스의 진화 및 전 세계 확산을 추적하는 데 있어 다양한 관점에서 SARS-CoV-2의 조사를 용이하게 할 수 있는 점 돌연변이의 스프레드시트 데이터 세트를 생성합니다. 우리의 분석 결과는 또한 코딩 유전자 E, M, ORF6, ORF7a, ORF7b 및 ORF10이 가장 안정적이며 잠재적으로 백신 및 약물 개발을 위한 표적화에 적합하다는 것을 보여줍니다.