A comparison of multiple deep learning-based auto-segmentation systems for head and neck cancer
Simon Temple,
United Kingdom
PD-0313
Abstract
A comparison of multiple deep learning-based auto-segmentation systems for head and neck cancer
1The Clatterbridge Cancer Centre, Medical Physics, Liverpool, United Kingdom
Show Affiliations
Hide Affiliations
Purpose or Objective
Commercial software can be used to automatically delineate
OARs with the potential for significant efficiency savings in the radiotherapy
treatment planning pathway and simultaneous reduction of inter- and
intra-observer variability.
Vendors of commercial systems often claim superiority of
their own system in comparison to competitor systems. To date there has been
limited research comparing multiple systems using multiple comparison metrics
and a common patient cohort. This has been addressed in this study.
Material and Methods
Four different deep learning-based auto-segmentation systems,
which had been independently developed for commercial use, were used to create five
commonly used head and neck (H&N) OARs (brainstem, spinal cord, mandible,
left and right parotid), for 30 H&N patient datasets. All systems were
running their latest available software version at the time of study (June 2021
– Sep 2021).
The resulting auto-segmented contours were compared to ‘gold
standard’ clinical contours, created by Consultant Clinical Oncologists at our
centre. All data used originated from patients entered into the PATHOS clinical
trial. The associated trial protocol includes clear anatomical guidelines for
OAR delineation and, in addition, trial entry involved pre-trial OAR outlining
Quality Assurance, which all Oncologists were required to undertake. A sample
of patient data was retrospectively reviewed during the trial, to provide
further assurance around the quality of contours used.
Standard similarity metrics of 3D Dice Similarity
Coefficient (DSC) and Added Path Length (APL) were utilised for the study.
Results
Table 1 contains mean and one standard deviation data for
both metrics, for all OARs and all systems tested. Values obtained for both 3D
DSC and APL correlate well with other recent published studies.
Performance differences between the four systems were
statistically insignificant for both 3D DSC and APL metrics.
Conclusion
Comparable levels of performance were observed between all
four systems. This indicates that deep learning-based auto-segmentation products
are developing at a similar pace in terms of the quality of contours produced.
It is therefore likely to be more beneficial to consider other
factors such as cost and range of contours offered when considering the evaluation
of such a system for clinical use.