ANTi-Vax: a novel Twitter dataset for COVID-19 vaccine misinformation detection

Abstract
Objectives: COVID-19 (SARS-CoV-2) pandemic has infected hundreds of millions and inflicted millions of deaths around the globe. Fortunately, the introduction of COVID-19 vaccines provided a glimmer of hope and a pathway to recovery. However, owing to misinformation being spread on social media and other platforms, there has been a rise in vaccine hesitancy which can lead to a negative impact on vaccine uptake in the population. The goal of this research is to introduce a novel machine learning-based COVID-19 vaccine misinformation detection framework.
Study design: We collected and annotated COVID-19 vaccine tweets and trained machine learning algorithms to classify vaccine misinformation.
Methods: More than 15,000 tweets were annotated as misinformation or general vaccine tweets using reliable sources and validated by medical experts. The classification models explored were XGBoost, LSTM, and BERT transformer model.
Results: The best classification performance was obtained using BERT, resulting in 0.98 F1-score on the test set. The precision and recall scores were 0.97 and 0.98, respectively.
Conclusion: Machine learning-based models are effective in detecting misinformation regarding COVID-19 vaccines on social media platforms.
Keywords: COVID-19; Deep learning; Misinformation detection; Natural language processing; Text classification; Vaccines.

All Keywords
【저자키워드】 COVID-19, deep learning, Vaccines., natural language processing, Misinformation detection, Text classification, 【초록키워드】 SARS-CoV-2, Vaccine, COVID-19 vaccine, pandemic, social media, media, Vaccine hesitancy, Spread, COVID-19 vaccines, Research, natural language processing, death, pathway, recall, misinformation, machine learning algorithm, Hope, precision and recall, natural language, Precision, social media platforms, machine, text, transformer, effective, deep, glimmer, resulting, collected, globe, provided, was obtained, 【제목키워드】 COVID-19 vaccine, dataset,

{{{ 추상적인 }}} {{ 목표: }} 코로나바이러스감염증-19(SARS-CoV-2) 전염병은 전 세계적으로 수억 명의 사람들을 감염시키고 수백만 명의 사망을 초래했습니다. 다행히 코로나19 백신의 도입은 한 줄기 희망과 회복의 길을 제공했습니다. 그러나 소셜 미디어 및 기타 플랫폼에 잘못된 정보가 퍼짐에 따라 백신에 대한 망설임이 증가하여 인구의 백신 섭취에 부정적인 영향을 미칠 수 있습니다. 이 연구의 목표는 새로운 기계 학습 기반 COVID-19 백신 잘못된 정보 탐지 프레임워크를 도입하는 것입니다. {{ 연구 설계: }} 우리는 코로나19 백신 트윗을 수집하고 주석을 달았고 백신 잘못된 정보를 분류하기 위해 머신러닝 알고리즘을 훈련했습니다. {{ 방법: }} 15,000개 이상의 트윗이 잘못된 정보 또는 일반 백신 트윗으로 주석이 달린 신뢰할 수 있는 출처를 사용하고 의료 전문가의 검증을 받았습니다. 탐색된 분류 모델은 XGBoost, LSTM 및 BERT 변환기 모델이었습니다. {{ 결과: }} BERT를 사용하여 최상의 분류 성능을 얻었으며 테스트 세트에서 0.98 F1 점수를 얻었습니다. 정밀도와 회상 점수는 각각 0.97과 0.98이었다. {{ 결론: }} 머신러닝 기반 모델은 소셜 미디어 플랫폼에서 COVID-19 백신에 관한 잘못된 정보를 감지하는 데 효과적입니다. {{ 키워드: }} 코로나19; 딥러닝; 잘못된 정보 감지; 자연어 처리; 텍스트 분류; 백신.