ESTRO 2022

Session Item

Monday

May 09

10:30 - 11:30

Room D5

Deep learning for image analysis

Chair: , The Netherlands;

Co-Chair: Catarina Veiga, United Kingdom

Overview: Proffered Papers are presented at one of the sessions scheduled in the main session halls. Each author will present orally for 7 minutes, followed by 3 minutes for discussion. Sessions will be streamed.

Session Type: Proffered Papers

Track: Physics

Journey:

10:30 - 10:40

Auto-segmentation of low contrast organs at risk improves with minimal prior delineation input

Mathis Ersted Rasmussen, Denmark

Presentation Number: OC-0769

Abstract

Abstract Title:

Auto-segmentation of low contrast organs at risk improves with minimal prior delineation input

Authors:

Mathis Ersted Rasmussen^1,2,3,4, Jasper Albertus Nijkamp^1,4, Jesper Grau Eriksen², Stine Sofia Korreman^1,3,4

¹Aarhus University Hospital, Danish Center for Particle Therapy, Aarhus, Denmark; ²Aarhus University Hospital, Department of Experimental Clinical Oncology, Aarhus, Denmark; ³Aarhus University Hospital, Department of Oncology, Aarhus, Denmark; ⁴Aarhus University, Department of Clinical Medicine, Aarhus, Denmark

Show Affiliations

Purpose or Objective

Deep learning auto-segmentation of organs-at-risk (OAR) in head-neck cancer performs well for structures with high visual contrast such as parotid glands, mandibles, spinal cord and brain. However, OARs such as glottic larynx and pharyngeal constrictor muscles (PCMs) can have low or no visual contrast which results in inaccurate auto-segmentation. We aim to improve segmentation performance of these challenging OARs by including minimal manual delineations as input to deep learning auto-segmentation models.

Material and Methods

We used a data set with planning CTs and manual delineations of OARs from 301 patients previously treated with radiotherapy for head-neck cancer at our institution. 30 cases were randomly selected for an external test set, and thus 271 cases were eligible for training. Our training data did not contain all OARs for all patients. Hence, models were trained and tested only on patients with the OARs of interest present (Table 1).

To simulate minimal manual delineation (MMD) input, we extracted the most cranial and caudal slice of each OAR from our data set and input these along with CT in 3D full resolution nnUNets [Isensee, F et al. 2020] (CT+MMD). We included lower, middle and upper PCM, glottic larynx and parotid glands. For reference we also trained nnUNets without MMD (CT-only). All models consisted of a single fold (1000 epochs) nnUNet with default parameters. Parotid glands were trained and evaluated as one OAR to avoid segmentation of the contra lateral gland.

We obtained Dice-Sørensen Coefficient (DSC) and Hausdorff distance 95th percentile (HD95) on the external test set with nnUNet’s build-in evaluation function. Test for similarity between the paired models for each OAR was done with a two-sided Wilcoxon signed-rank test with a confidence level of 5 %.

Results

DSC and HD95 were significantly better for all OARs with CT+MMD compared to CT-only (p ≤ 0.001, figure 1). Low and no contrast OARs (PCMs and glottic larynx) benefited most, whereas the effect was marginal for the high contrast OAR (parotid glands). Increases in median DSC were 0.56 to 0.86 for glottic larynx, 0.64 to 0.83 for lower PCM, 0.65 to 0.83 for the middle PCM, 0.61 to 0.78 for upper PCM and 0.83 to 0.86 for parotid glands.

Conclusion

By training deep learning auto-segmentation models on CT along with just two manually delineated slices per OAR, we demonstrate major improvements of segmentation metrics for low and no contrast OARs such as PCMs and glottic larynx compared to CT-only models. For the included high contrast OAR (parotid glands) CT-only performed well, and no substantial improvements were observed with CT+MMD.
Our findings show that it may be beneficial for clinicians to manually delineate a few selected slices of low and no contrast OARs prior to running deep learning auto-segmentation models, as the subsequent predictions are likely to require much less revision than for predictions of fully automated models.