Abstract
Humans have been shown to have biases when reading medical images, raising questions about whether humans are uniform in their disease gradings. Artificial intelligence (AI) tools trained on human-labeled data may have inherent human non-uniformity. In this study, we used a radiographic knee osteoarthritis external validation dataset of 50 patients and a six-year retrospective consecutive clinical cohort of 8,273 patients. An FDA-approved and CE-marked AI tool was tested for potential non-uniformity in Kellgren-Lawrence grades between the right and left sides of the images. We flipped the images horizontally so that a left knee looked like a right knee and vice versa. According to human review, the AI tool showed non-uniformity with 20–22% disagreements on the external validation dataset and 13.6% on the cohort. However, we found no evidence of a significant difference in the accuracy compared to senior radiologists on the external validation dataset, or age bias or sex bias on the cohort. AI non-uniformity can boost the evaluated performance against humans, but image areas with inferior performance should be investigated.
Materials and methods
Study design and clinical cohort
To explore uniformity in a commercial knee OA AI tool, we created a cohort using bilateral weight-bearing, frontal knee radiographs and their KL grades and minimal medial and lateral JSW from a consecutive, clinical retrospective knee OA database8. Informed consent was waived by local ethics committee #2101001. The study was performed following regulations and guidelines. We used relevant items from the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) to describe the study6. The database included patients who were 35 to 79 years old (both years included) with a clinical knee radiographic examination from January 1, 2016, to December 31, 2021 (both dates included), from the Copenhagen University Hospital, Bispebjerg-Frederiksberg Hospital (BFH), Denmark. A commercial knee OA AI tool provided bilateral KL grades on par with senior MSK radiologists3 and JSW (RBknee™ v2.1 for KL grading and FDA-cleared v1.0.1 for JSW measurements, Radiobotics ApS, Copenhagen, Denmark). The flow of cases can be found in the original paper8. The output from the AI tool was divided into AI-R for the right knee and AI-L for the left knee (see Fig. 1).
Open access
This research is open access.
Nature Scientific Reports is an open access journal publishing original research from across all areas of the natural sciences, psychology, medicine and engineering.