Freeman, K. et al | 2021| Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy| BMJ| 374 | n1872 | doi:10.1136/bmj.n1872
This review was commissioned by the UK National Screening Committee to determine whether there is sufficient evidence to use artificial intelligence (AI) for mammographic image analysis in breast screening practice. The research team’s aim was to assess the accuracy of AI to detect breast cancer when integrated into breast screening programmes, with a focus on the cancer type detected. They identified 12 studies which evaluated commercially available or in-house convolutional neural network AI systems, of which nine included a comparison with radiologists. The reviewers’ findings disagree with the publicity some studies have received and opinions published in various journals, which claim that AI systems outperform humans and might soon be used instead of experienced radiologists (Source: Freeman et al, 2021).
Abstract
Objective
To examine the accuracy of artificial intelligence (AI) for the detection of breast cancer in mammography screening practice.
Design
Systematic review of test accuracy studies.
Data sources
Medline, Embase, Web of Science, and Cochrane Database of Systematic Reviews from 1 January 2010 to 17 May 2021.
Eligibility criteria
Studies reporting test accuracy of AI algorithms, alone or in combination with radiologists, to detect cancer in women’s digital mammograms in screening practice, or in test sets. Reference standard was biopsy with histology or follow-up (for screen negative women). Outcomes included test accuracy and cancer type detected.
Study selection and synthesis
Two reviewers independently assessed articles for inclusion and assessed the methodological quality of included studies using the QUality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. A single reviewer extracted data, which were checked by a second reviewer. Narrative data synthesis was performed.
Results
Twelve studies totalling 131 822 screened women were included. No prospective studies measuring test accuracy of AI in screening practice were found. Studies were of poor methodological quality. Three retrospective studies compared AI systems with the clinical decisions of the original radiologist, including 79 910 women, of whom 1 878 had screen detected cancer or interval cancer within 12 months of screening. Thirty four (94 per cent ) of 36 AI systems evaluated in these studies were less accurate than a single radiologist, and all were less accurate than consensus of two or more radiologists. Five smaller studies (1086 women, 520 cancers) at high risk of bias and low generalisability to the clinical context reported that all five evaluated AI systems (as standalone to replace radiologist or as a reader aid) were more accurate than a single radiologist reading a test set in the laboratory. In three studies, AI used for triage screened out 53 per cent, 45 per cent, and 50 per cent of women at low risk but also 10 per cent, 4 per cent , and 0 per cent of cancers detected by radiologists.
Conclusions
Current evidence for AI does not yet allow judgement of its accuracy in breast cancer screening programmes, and it is unclear where on the clinical pathway AI might be of most benefit. AI systems are not sufficiently specific to replace radiologist double reading in screening programmes. Promising results in smaller studies are not replicated in larger studies. Prospective studies are required to measure the effect of AI in clinical practice. Such studies will require clear stopping rules to ensure that AI does not reduce programme specificity.
The BMJ Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy [primary paper]