Abstract
Minimal joint space width (mJSW) is a radiographic measurement used in the diagnosis of hip osteoarthritis. A large variance when measuring mJSW highlights the need for a supporting diagnostic tool. This study aimed to estimate the reliability of a deep learning algorithm designed to measure the mJSW in pelvic radiographs and to estimate agreement between the algorithm and orthopedic surgeons, radiologists, and a reporting radiographer. The algorithm was highly consistent when measuring mJSW with a mean difference at 0.00. Human readers, however, were subject to variance with a repeatability coefficient of up to 1.31. Statistically, although not clinically significant, differences were found between the algorithm’s and all readers’ measurements with mean measured differences ranging from −0.78 to −0.36 mm. In conclusion, the algorithm was highly reliable, and the mean measured difference between the human readers combined and the algorithm was low, i.e., −0.5 mm bilaterally. Given the consistency of the algorithm, it may be a useful tool for monitoring hip osteoarthritis.
Introduction
Osteoarthritis (OA) is a common global public health problem. Affecting the large joints such as the hip and the knee, OA has become a major disabling condition particularly among the elderly [1,2,3]. Hip dysplasia in the younger population may however lead to early onset of OA [4,5]. Globally, OA affects more than 230 million individuals. Hip OA makes up approximately 32 million of these cases, causing it to be a significant public health problem [6].
The initial diagnosis of hip OA is often based on the clinical presentation of the patient and supported by pelvic radiographs [2,4,7], where minimal joint space width (mJSW) is the key parameter supported by osteophyte formation and subchondral sclerosis [8,9]. The radiographic definition and classification of hip OA lacks consensus between healthcare professionals, and the final diagnosis is often based on subjective radiographic findings combined with patient history and clinical findings [8,9]. The radiographic definition and classification of hip OA lack consensus between healthcare professionals, and the mJSW defining hip OA has varied between 1.5 and 4.0 mm. An mJSW ≤ 2 mm has been found to have a strong association to self-reported hip pain in patients aged 60 years and older [8]. The final diagnosis of hip OA is often based on subjective radiographic findings combined with patient history and clinical findings [8,9].
In the realm of radiology, AI and machine learning (ML) have gained significant traction, especially evident in the fact that around 75% of FDA-approved AI/ML-Enabled Medical Devices pertain to the field’s subspecialty of radiology [10]. A study has demonstrated that when using deep learning to grade joint space narrowing on pelvic radiographs as absent, mild, moderate or severe, the performance of deep learning is similar to the performance of expert radiologists [9]. In some hospitals, hip osteoarthritis is not routinely reported by expert radiologists, but often by orthopedic surgeons or radiographers. Thus, it is necessary to test the reliability and agreement between deep learning and other healthcare professions.
The objectives of this study were to estimate the reliability of a deep learning algorithm designed to measure mJSW in anterior–posterior (AP) pelvic radiographs and to estimate agreement between the algorithm and trained healthcare personnel.
Methods
In this retrospective study, a deep learning algorithm, trained to measure the mJSW of the hip, was applied to 78 radiographs. For comparison, a senior and a junior radiologist, a senior and a junior orthopedic surgeon, and one senior reporting radiographer evaluated the radiographs regarding the mJSW. Approval of the study was given by the Danish National Committee on Health Research Ethics (Project-ID: 2103745). The study was registered with the regional health authorities (Project-ID: 21/22036). All analyses were performed in accordance with the current Guidelines for Reporting Reliability and Agreement Studies [11,12].
Results
The algorithm was not able to analyze 7 of the 78 radiographs. The remaining 71 images were analyzed by both the algorithm and readers and were therefore included in the study. For the 71 radiographs, the average age was 50.1 years, and the gender distribution was 36 females and 35 males. The mJSW values tended to be lower for the readers than for the algorithm. For the five readers, the mean measured mJSW for the left hip ranged from 3.27 to 3.59 mm, whereas it was 3.96 mm for the algorithm. On the right side, the corresponding measurements ranged from 3.27 to 3.65 mm for the readers and 4.05 mm for the algorithm.
Discussion
This study tested a deep learning algorithm for measuring the mJSW on pelvic radiographs. The study found that the algorithm was highly reliable, although agreement between the algorithm and human readers differed significantly.
The algorithm offered consistent measurements and may therefore be a useful support tool in the decision making of hip replacement and for quantitative monitoring of the mJSW. A highly consistent algorithm may also be particularly valuable in epidemiologic or multicenter studies correlating radiographic findings with clinical information, potentially with automated transfer of data from the algorithm to clinical databases. In future studies, correlating the algorithm measurements with clinical findings could help to validate the measurements. Furthermore, future studies on how to incorporate the algorithm as an assisting tool to readers could be beneficial.
Open access
This research is open access.
BioMedInformatics is an international, peer-reviewed, open access journal on all areas of biomedical informatics, as well as computational biology and medicine, published quarterly online by MDPI.