Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort

RBknee didn’t analyze the left and right knees in exactly the same way—around 15–20% of cases were graded differently. However, its overall grading accuracy was still comparable to that of experienced radiologists. There were no clear signs of consistent bias toward one side in terms of disease grading or joint space width.
Constructing a clinical radiographic knee osteoarthritis database using artificial intelligence tools with limited human labor: A proof of principle

Using RBknee and three other AI tools, the authors saved about 800 hours of radiologist reading time and only manually reviewed 16.0% of the images in the database
Interobserver agreement and performance of concurrent AI assistance for radiographic evaluation of knee osteoarthritis

RBknee improved consistency and performance in KL grading, especially among junior readers. Interobserver agreement increased with AI support, and board-certified radiologists achieved near-perfect agreement when using RBknee—surpassing even the unassisted reference standard.
External validation of an artificial intelligence tool for radiographic knee osteoarthritis severity classification

RBknee achieved a similar and almost perfect agreement with the musculoskeletal radiology consultant consensus matching the inter-reader agreement between consultants.