Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the COVID-19 pandemic. It was first detected in China and was rapidly spread to other countries. Several thousands of whole genome sequences of SARS-CoV-2 have been reported and it is important to compare them and identify distinctive evolutionary/mutant markers. Utilizing chaos game representation (CGR) as well as recurrence quantification analysis (RQA) as a powerful nonlinear analysis technique, we proposed an effective process to extract several valuable features from genomic sequences of SARS-CoV-2. The represented features enable us to compare genomic sequences with different lengths. The provided dataset involves totally 18 RQA-based features for 4496 instances of SARS-CoV-2.
【저자키워드】 SARS-CoV-2, Nonlinear analysis, Coordinate series, Chaos game representation, Recurrence quantification analysis, 【초록키워드】 coronavirus, COVID-19 pandemic, China, dataset, quantification, Analysis, acute respiratory syndrome, whole genome sequence, genomic sequence, feature, effective, responsible, identify, spread to, reported, provided, instance, 【제목키워드】 feature,