A systematic quality scoring analysis to assess automated cardiovascular magnetic resonance segmentation algorithms
Author(s)
Type
Journal Article
Abstract
Background: The quantitative measures used to assess the performance of automated methods often do not reflect the clinical acceptability of contouring. A quality-based assessment of automated cardiac magnetic resonance (CMR) segmentation more relevant to clinical practice is therefore needed.
Objective: We propose a new method for assessing the quality of machine learning (ML) outputs. We evaluate the clinical utility of the proposed method as it is employed to systematically analyse the quality of an automated contouring algorithm.
Methods: A dataset of short-axis (SAX) cine CMR images from a clinically heterogeneous population (n = 217) were manually contoured by a team of experienced investigators. On the same images we derived automated contours using a ML algorithm. A contour quality scoring application randomly presented manual and automated contours to four blinded clinicians, who were asked to assign a quality score from a predefined rubric. Firstly, we analyzed the distribution of quality scores between the two contouring methods across all clinicians. Secondly, we analyzed the interobserver reliability between the raters. Finally, we examined whether there was a variation in scores based on the type of contour, SAX slice level, and underlying disease.
Results: The overall distribution of scores between the two methods was significantly different, with automated contours scoring better than the manual (OR (95% CI) = 1.17 (1.07–1.28), p = 0.001; n = 9401). There was substantial scoring agreement between raters for each contouring method independently, albeit it was significantly better for automated segmentation (automated: AC2 = 0.940, 95% CI, 0.937–0.943 vs manual: AC2 = 0.934, 95% CI, 0.931–0.937; p = 0.006). Next, the analysis of quality scores based on different factors was performed. Our approach helped identify trends patterns of lower segmentation quality as observed for left ventricle epicardial and basal contours with both methods. Similarly, significant differences in quality between the two methods were also found in dilated cardiomyopathy and hypertension.
Conclusions: Our results confirm the ability of our systematic scoring analysis to determine the clinical acceptability of automated contours. This approach focused on the contours' clinical utility could ultimately improve clinicians' confidence in artificial intelligence and its acceptability in the clinical workflow.
Objective: We propose a new method for assessing the quality of machine learning (ML) outputs. We evaluate the clinical utility of the proposed method as it is employed to systematically analyse the quality of an automated contouring algorithm.
Methods: A dataset of short-axis (SAX) cine CMR images from a clinically heterogeneous population (n = 217) were manually contoured by a team of experienced investigators. On the same images we derived automated contours using a ML algorithm. A contour quality scoring application randomly presented manual and automated contours to four blinded clinicians, who were asked to assign a quality score from a predefined rubric. Firstly, we analyzed the distribution of quality scores between the two contouring methods across all clinicians. Secondly, we analyzed the interobserver reliability between the raters. Finally, we examined whether there was a variation in scores based on the type of contour, SAX slice level, and underlying disease.
Results: The overall distribution of scores between the two methods was significantly different, with automated contours scoring better than the manual (OR (95% CI) = 1.17 (1.07–1.28), p = 0.001; n = 9401). There was substantial scoring agreement between raters for each contouring method independently, albeit it was significantly better for automated segmentation (automated: AC2 = 0.940, 95% CI, 0.937–0.943 vs manual: AC2 = 0.934, 95% CI, 0.931–0.937; p = 0.006). Next, the analysis of quality scores based on different factors was performed. Our approach helped identify trends patterns of lower segmentation quality as observed for left ventricle epicardial and basal contours with both methods. Similarly, significant differences in quality between the two methods were also found in dilated cardiomyopathy and hypertension.
Conclusions: Our results confirm the ability of our systematic scoring analysis to determine the clinical acceptability of automated contours. This approach focused on the contours' clinical utility could ultimately improve clinicians' confidence in artificial intelligence and its acceptability in the clinical workflow.
Date Issued
2022-02-15
Date Acceptance
2021-12-22
Citation
Frontiers in Cardiovascular Medicine, 2022, 8
ISSN
2297-055X
Publisher
Frontiers Media S.A.
Journal / Book Title
Frontiers in Cardiovascular Medicine
Volume
8
Copyright Statement
Copyright © 2022 Rauseo, Omer, Amir-Khalili, Sojoudi, Le, Cook, Hausenloy, Ang, Toh, Bryant, Chin, Paiva, Fung, Cooper, Khanji, Aung and Petersen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
License URL
Identifier
https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000763663000001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=a2bf6146997ec60c407a63945d4e92bb
Subjects
assessment
automated contouring
Cardiac & Cardiovascular Systems
cardiac magnetic resonance (CMR)
cardiac segmentation
Cardiovascular System & Cardiology
Life Sciences & Biomedicine
machine learning
quality control
Science & Technology
VALIDATION
Publication Status
Published
Article Number
816985
Date Publish Online
2022-02-15