See you at RSNA Dec 1 – 5

Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort

Authors:

Anders Lenskjold, Mathias W Brejnebøl, Martin H Rose, Henrik Gudbergsen, Akshay Chaudhari, Anders Troelsen, Anne Moller, Janus U Nybing & Mikael Boesen

Share this publication

Abstract

Humans have been shown to have biases when reading medical images, raising questions about whether humans are uniform in their disease gradings. Artificial intelligence (AI) tools trained on human-labeled data may have inherent human non-uniformity. In this study, we used a radiographic knee osteoarthritis external validation dataset of 50 patients and a six-year retrospective consecutive clinical cohort of 8,273 patients. An FDA-approved and CE-marked AI tool was tested for potential non-uniformity in Kellgren-Lawrence grades between the right and left sides of the images. We flipped the images horizontally so that a left knee looked like a right knee and vice versa. According to human review, the AI tool showed non-uniformity with 20–22% disagreements on the external validation dataset and 13.6% on the cohort. However, we found no evidence of a significant difference in the accuracy compared to senior radiologists on the external validation dataset, or age bias or sex bias on the cohort. AI non-uniformity can boost the evaluated performance against humans, but image areas with inferior performance should be investigated.

Materials and methods

Study design and clinical cohort

To explore uniformity in a commercial knee OA AI tool, we created a cohort using bilateral weight-bearing, frontal knee radiographs and their KL grades and minimal medial and lateral JSW from a consecutive, clinical retrospective knee OA database8. Informed consent was waived by local ethics committee #2101001. The study was performed following regulations and guidelines. We used relevant items from the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) to describe the study6. The database included patients who were 35 to 79 years old (both years included) with a clinical knee radiographic examination from January 1, 2016, to December 31, 2021 (both dates included), from the Copenhagen University Hospital, Bispebjerg-Frederiksberg Hospital (BFH), Denmark. A commercial knee OA AI tool provided bilateral KL grades on par with senior MSK radiologists3 and JSW (RBknee™ v2.1 for KL grading and FDA-cleared v1.0.1 for JSW measurements, Radiobotics ApS, Copenhagen, Denmark). The flow of cases can be found in the original paper8. The output from the AI tool was divided into AI-R for the right knee and AI-L for the left knee (see Fig. 1).

Open access

This research is open access.

Nature Scientific Reports is an open access journal publishing original research from across all areas of the natural sciences, psychology, medicine and engineering.

The latest research about Radiobotics

Entering your information above does not subscribe you to marketing emails from Radiobotics; it just makes it easier for us to get in touch with you. You can always read our privacy policy.

CONTACT US

Interested in a research collaboration?

Would you like to do some research with us or with our AI solutions for fracture detection?

Complete the form, and someone will get back to you shortly. Or find department emails at our contact us page.