Copenhagen, Denmark
Onsite/Online

ESTRO 2022

Session Item

Saturday
May 07
16:55 - 17:55
Poster Station 1
07: Imaging & AI techniques
Stephanie Tanadini-Lang, Switzerland
1590
Poster Discussion
Physics
A comparison of multiple deep learning-based auto-segmentation systems for head and neck cancer
Simon Temple, United Kingdom
PD-0313

Abstract

A comparison of multiple deep learning-based auto-segmentation systems for head and neck cancer
Authors:

Simon Temple1

1The Clatterbridge Cancer Centre, Medical Physics, Liverpool, United Kingdom

Show Affiliations
Purpose or Objective

Commercial software can be used to automatically delineate OARs with the potential for significant efficiency savings in the radiotherapy treatment planning pathway and simultaneous reduction of inter- and intra-observer variability.

Vendors of commercial systems often claim superiority of their own system in comparison to competitor systems. To date there has been limited research comparing multiple systems using multiple comparison metrics and a common patient cohort. This has been addressed in this study.

Material and Methods

Four different deep learning-based auto-segmentation systems, which had been independently developed for commercial use, were used to create five commonly used head and neck (H&N) OARs (brainstem, spinal cord, mandible, left and right parotid), for 30 H&N patient datasets. All systems were running their latest available software version at the time of study (June 2021 – Sep 2021).

The resulting auto-segmented contours were compared to ‘gold standard’ clinical contours, created by Consultant Clinical Oncologists at our centre. All data used originated from patients entered into the PATHOS clinical trial. The associated trial protocol includes clear anatomical guidelines for OAR delineation and, in addition, trial entry involved pre-trial OAR outlining Quality Assurance, which all Oncologists were required to undertake. A sample of patient data was retrospectively reviewed during the trial, to provide further assurance around the quality of contours used.

Standard similarity metrics of 3D Dice Similarity Coefficient (DSC) and Added Path Length (APL) were utilised for the study.

Results

Table 1 contains mean and one standard deviation data for both metrics, for all OARs and all systems tested. Values obtained for both 3D DSC and APL correlate well with other recent published studies.

Performance differences between the four systems were statistically insignificant for both 3D DSC and APL metrics.


Conclusion

Comparable levels of performance were observed between all four systems. This indicates that deep learning-based auto-segmentation products are developing at a similar pace in terms of the quality of contours produced.

It is therefore likely to be more beneficial to consider other factors such as cost and range of contours offered when considering the evaluation of such a system for clinical use.