Vienna, Austria

ESTRO 2023

Session Item

Monday
May 15
10:30 - 11:30
Stolz 2
Automation
Cecile Wolfs, The Netherlands;
Wilko Verbakel, The Netherlands
3260
Mini-Oral
Physics
10:30 - 11:30
Single-click user input reduces false detection in deep learning head and neck tumor segmentation
Jintao Ren, Denmark
MO-0799

Abstract

Single-click user input reduces false detection in deep learning head and neck tumor segmentation
Authors:

Jintao Ren1,2,3, Jasper Nijkamp1,3, Mathis Ersted Rasmussen1,2,3, Jesper Grau Eriksen4, Stine Sofia Korreman1,2,3

1Aarhus University, Department of Clinical Medicine, Aarhus, Denmark; 2Aarhus University, Department of Oncology, Aarhus, Denmark; 3Aarhus University Hospital, Danish Center for Particle Therapy, Aarhus, Denmark; 4Aarhus University Hospital, Department of Experimental Clinical Oncology, Aarhus, Denmark

Show Affiliations
Purpose or Objective

Gross tumor volumes (GTV) of head and neck cancer (HNC) are difficult to identify in images, even with deep learning (DL), particularly when the primary tumor (GTV-T) and multiple nodal metastases (GTV-N) are present. Using only imaging for DL segmentation may lead to false segmentations. This study examines whether minimal user input, in which the oncologist only needs to single-click the lesions to be segmented, improves the detection ratio and segmentation performance of deep learning-based auto-segmentation for GTV-T and GTV-N of HNC.

Material and Methods

We have included treatment planning CT, PET, and MRI (T1w mDixon and T2w) images for 567 HNC patients with a wide variety of tumor sites (larynx, pharynx, oral cavity, sinonasal, and salivary gland carcinomas). GTV-T and GTV-N clinical delineations were treated as two separate DL targets. The data was randomly split into training(n=375), validation(n=95), and test sets(n=97).

To simulate user input clicks, we generated a dot of random size between 5 and 10 mm³ at a random location inside each distinct target volume. We used this simulated user feed in conjunction with CT, PET, T1w, and T2w MRI scans as inputs to a 3D UNet. We compared the segmentation results to the UNet using only the scans as input.

We evaluated the detection ratio(%) on all the distinct GTV-Ts and GTV-Ns. The segmentation performance was evaluated using Dice Similarity Coefficient(Dice), Hausdorff distance 95%(HD95), mean surface distance(MSD), and Surface-Dice with 2mm tolerance. The voxel-based false discovery rate (FDR) and false negative rate (FNR) were used to measure false segmentation and compared using a Wilcoxon signed-rank test(p<0.05). FDR can be interpreted as an indicator of false positive segmentations, whereas FNR indicates false negative segmentations. For all metrics, the mean and 95% confidence interval (CI95, bootstrapping 10000 samples) were reported.

Results

On the test set of 97 patients with 100 GTV-Ts and 177 GTV-Ns, after incorporating user feed, the detection ratio increased from GTV-T(95%)/GTV-N(79%) to GTV-T(99%)/GTV-N(99%). All segmentation metrics were improved, especially for GTV-N (Figure 1-A). The mean(CI95) of FDR for GTV-T/-N decreased from 0.27(0.23-0.30)/0.31(0.26-0.37) to 0.22(0.20-0.25)/0.16(0.14-0.19) with p<0.001. FNR of GTV-T was marginally affected by user input, reduced from 0.32(0.28-0.36) to 0.29(0.25-0.32) with p>0.05. Whereas FNR of GTV-N decreased significantly from 0.29(0.25-0.32) to 0.17(0.16-0.19) with p<0.001 (Figure 1-B). Two cases with significant improvement in GTV-N segmentation are plotted in Figure 2.


Conclusion

We improved HNC GTV DL auto-segmentation on a highly diverse data set,  where DL performance is typically poor. Single-click input per lesion reduced false negatives for GTV-N and false positives for GTV-T and GTV-N. This suggests that clinicians' prior knowledge could supplement medical scans, improving the detection ratio of GTV-N and the segmentation performance of GTV in DL auto-segmentation.