Abstract
Objectives
The aim of this study was to evaluate the diagnostic performance of nonspecialist readers with and without the use of an artificial intelligence (AI) support tool to detect traumatic fractures on radiographs of the appendicular skeleton.
Methods
The design was a retrospective, fully crossed multi-reader, multi-case study on a balanced dataset of patients (≥2 years of age) with an AI tool as a diagnostic intervention. Fifteen readers assessed 340 radiographic exams, with and without the AI tool in 2 different sessions and the time spent was automatically recorded. Reference standard was established by 3 consultant radiologists. Sensitivity, specificity, and false positives per patient were calculated.
AI tool
The AI tool used in the study (RBfracture™; Radiobotics, Copenhagen, Denmark) is CE-marked as a class IIa medical device and is intended to be used in a clinical setting as a support tool for detecting fractures in the appendicular skeleton on all patients ≥2 years of age. The AI tool is supplied with a Digital Imaging and Communications in Medicine radiograph as input and as output creates a secondary capture and highlights the detected fractures with a box bounding of the findings. The AI tool implements a Deep Convolutional Neural Network based on the Detectron226 framework. The framework is used for generic object detection and was further engineered for the specific task of detecting fractures in radiographic images. The development of the AI tool was done using a dataset consisting of more than 320,000 radiographs collected from more than 100 radiology departments across multiple continents. Prior to development, the data set was split into 3 subsets, a training subset consisting of 80%, a tuning subset of 10%, and an internal verification set consisting of 10% of the data. No image or patient included in the study cohort was used to train or tune the AI tool.
Results:
Patient-wise sensitivity increased from 72% to 80% (P < .05) and patient-wise specificity increased from 81% to 85% (P < .05) in exams aided by the AI tool compared to the unaided exams. The increase in sensitivity resulted in a relative reduction of missed fractures of 29%. The average rate of false positives per patient decreased from 0.16 to 0.14, corresponding to a relative reduction of 21%. There was no significant difference in average reading time spent per exam. The largest gain in fracture detection performance, with AI support, across all readers, was on nonobvious fractures with a significant increase in sensitivity of 11 percentage points (pp) (60%-71%).
Conclusion:
The diagnostic performance for detection of traumatic fractures on radiographs of the appendicular skeleton improved among nonspecialist readers tested AI fracture detection support tool showed an overall reader improvement in sensitivity and specificity when supported by an AI tool. Improvement was seen in both sensitivity and specificity without negatively affecting the interpretation time.
Open access
This research is open access.
BJR|Open is the gold open access, online-only, international research journal published by the British Institute of Radiology (BIR).