Artificial intelligence for imaging-based COVID-19 detection: Systematic review comparing added value of AI versus human readers

Highlights • There may be an added value of AI model supported imaging-based COVID-19 detection. • Studies reported comparable or better performance of AI or AI-supported readings. • There was lower variability of diagnostic performance for AI than for human readers. • Our systematic review shows heterogeneity of data characteristics and risks of bias. • There is a variety of applied methodologies and statistical analysis limitations. Purpose A growing number of studies have examined whether Artificial Intelligence (AI) systems can support imaging-based diagnosis of COVID-19-caused pneumonia, including both gains in diagnostic performance and speed. However, what is currently missing is a combined appreciation of studies comparing human readers and AI. Methods We followed PRISMA-DTA guidelines for our systematic review, searching EMBASE, PUBMED and Scopus databases. To gain insights into the potential value of AI methods, we focused on studies comparing the performance of human readers versus AI models or versus AI-supported human readings. Results Our search identified 1270 studies, of which 12 fulfilled specific selection criteria. Concerning diagnostic performance, in testing datasets reported sensitivity was 42–100% (human readers, n = 9 studies), 60–95% (AI systems, n = 10) and 81–98% (AI-supported readers, n = 3), whilst reported specificity was 26–100% (human readers, n = 8), 61–96% (AI systems, n = 10) and 78–99% (AI-supported readings, n = 2). One study highlighted the potential of AI-supported readings for the assessment of lung lesion burden changes, whilst two studies indicated potential time savings for detection with AI. Conclusions Our review indicates that AI systems or AI-supported human readings show less performance variability (interquartile range) in general, and may support the differentiation of COVID-19 pneumonia from other forms of pneumonia when used in high-prevalence and symptomatic populations. However, inconsistencies related to study design, reporting of data, areas of risk of bias, as well as limitations of statistical analyses complicate clear conclusions. We therefore support efforts for developing critical elements of study design when assessing the value of AI for diagnostic imaging.

All Keywords
【초록키워드】 COVID-19, COVID-19 pneumonia, Pneumonia, artificial intelligence, diagnostic, Diagnosis, lung, systematic review, heterogeneity, sensitivity, specificity, Characteristics, artificial, symptomatic, Diagnostic imaging, Risk of bias, Study design, Reporting, dataset, methodology, differentiation, speed, Critical, statistical analysis, criteria, lung lesion, Support, changes, interquartile range, Variability, effort, statistical analyses, Highlights, limitation, element, limitations, populations, intelligence, Result, examined, reported, indicated, form, applied, supported, indicate, added, less, variety, comparable, complicate, risks of bias, statistical analysis, 【제목키워드】 COVID-19, added,

하이라이트 • AI 모델 지원 영상 기반 COVID-19 감지의 부가가치가 있을 수 있습니다. • 연구에 따르면 AI 또는 AI 지원 판독값과 비슷하거나 더 나은 성능이 보고되었습니다. • AI의 진단 성능 변동성은 인간 독자보다 낮았습니다. • 우리의 체계적인 검토는 데이터 특성의 이질성과 편견의 위험을 보여줍니다. • 다양한 적용 방법론과 통계적 분석의 한계가 있다. 목적 AI(인공 지능) 시스템이 진단 성능과 속도 향상을 포함하여 COVID-19로 인한 폐렴의 영상 기반 진단을 지원할 수 있는지 여부를 조사하는 연구가 점점 늘어나고 있습니다. 그러나 현재 누락된 것은 인간 독자와 AI를 비교한 연구에 대한 종합적인 평가입니다. 방법 EMBASE, PUBMED 및 Scopus 데이터베이스를 검색하여 체계적인 검토를 위해 PRISMA-DTA 지침을 따랐습니다. AI 방법의 잠재적 가치에 대한 통찰력을 얻기 위해 우리는 인간 독자와 AI 모델 또는 AI 지원 인간 읽기의 성능을 비교하는 연구에 집중했습니다. 결과 검색을 통해 1270건의 연구가 확인되었으며 그 중 12건이 특정 선택 기준을 충족했습니다. 진단 성능과 관련하여 테스트 데이터 세트에서 보고된 민감도는 42–100%(인간 독자, n=9 연구), 60–95%(AI 시스템, n=10) 및 81–98%(AI 지원 독자, n=3 ), 보고된 특이성은 26–100%(인간 독자, n=8), 61–96%(AI 시스템, n=10) 및 78–99%(AI 지원 판독, n=2)였습니다. 한 연구는 폐 병변 부담 변화 평가를 위한 AI 지원 판독의 잠재력을 강조한 반면, 두 연구는 AI로 탐지하는 데 잠재적인 시간 절약을 보여주었습니다. 결론 우리의 검토에 따르면 AI 시스템 또는 AI 지원 인간 판독값은 일반적으로 성능 변동성(사분위수 범위)이 더 적고 유병률이 높고 증상이 있는 집단에서 사용할 때 COVID-19 폐렴을 다른 형태의 폐렴과 구별하는 데 도움이 될 수 있습니다. 그러나 연구 설계, 데이터 보고, 비뚤림 위험 영역 및 통계 분석의 한계와 관련된 불일치로 인해 명확한 결론이 복잡해집니다. 따라서 우리는 진단 영상을 위한 AI의 가치를 평가할 때 연구 설계의 중요한 요소를 개발하려는 노력을 지원합니다.