Copenhagen, Denmark
Onsite/Online

ESTRO 2022

Session Item

Monday
May 09
10:30 - 11:30
Room D5
Deep learning for image analysis
Andre Dekker, The Netherlands;
Catarina Veiga, United Kingdom
Proffered Papers
Physics
10:50 - 11:00
Uncertainty map for error prediction in deep learning-based head and neck tumor auto-segmentation
Jintao Ren, Denmark
OC-0771

Abstract

Uncertainty map for error prediction in deep learning-based head and neck tumor auto-segmentation
Authors:

Jintao Ren1, Jonas Teuwen2, Jasper Nijkamp3, Mathis Rasmussen1, Jesper Eriksen4, Jan-Jakob Sonke2, Stine Korreman1

1Aarhus University Hospital, Danish Center for Particle Therapy, Department of Oncology, Aarhus, Denmark; 2Netherlands Cancer Institute, Department of Radiation Oncology, Amsterdam, The Netherlands; 3Aarhus University Hospital, Danish Center for Particle Therapy, Aarhus, Denmark; 4Aarhus University Hospital, Department of Experimental Clinical Oncology, Aarhus, Denmark

Show Affiliations
Purpose or Objective

Deep learning (DL) based auto-segmentation has shown to be performant in a variety of radiotherapy applications. Even though auto-segmentation of the gross tumor volume (GTV) is acceptable for a large group of patients, it still fails in a subgroup of patients. In this study, we investigate the use of uncertainty map to visualize potential uncertainties and to indicate patient-level segmentation failure.

Material and Methods

We collected HNSCC patients (n=301) comprising Larynx, Pharynx, Oral, Sinonasal and Salivary gland carcinomas. Furthermore, treatment planning CT, PET, and MRI (T1w mDixon and T2w) images, as well as clinical delineations of the primary tumor (GTV-T) and nodal metastases (GTV-N) were also included. MRIs were deformable registered to PET/CT.  The union of GTV-T and GTV-N were treated as ground truth (GTV-Clinic) for the DL prediction (GTV-DL).

We trained a 3D UNet for 1000 epochs in a five-fold cross-validation fashion. At test time, for each patient, 50 stochastic samples were drawn from the UNet with Monte Carlo dropouts(p=0.1)  from snapshot-saved models. The mean of all output softmax probability maps was used to aggregate GTV-DL and uncertainty map.

The uncertainty map is a heatmap representing prediction uncertainties. We correlated the geometric location of the thresholded uncertainty map, the uncertainty regions (UR), with false predictions of the GTV-DL to locate potential predicted error regions (ER). We used the Dice similarity coefficient (Dice) to quantify the degree of overlap between UR and ER.

In order to detect patient-level segmentation failure, we employed overlap metrics, False Omission Rate, False Negative Rate, and Surface Dice between the UR and GTV-DL to estimate GTV-DL performance in Dice. A Gradient Boosting Regressor was applied for the Dice estimation. We evaluated the regression result using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R2).

Results

The 3D UNet achieved reasonable and acceptable segmentation performance (Figure 1-A). Figure 1-B illustrates the functionality of uncertainty maps for voxel-wise uncertainty estimation with three examples. In the first two cases, UR can clearly indicate the location of errors; but, in the last case, UR does not correlate with ER. This specific example is restricted due to the ambiguity of GTV location by insufficient information from images. Figure 2 indicates that using an uncertainty threshold of 𝜏=0.7, the regressor could estimate the actual segmentation Dice with an RMSE of 0.14, MAE of 0.09, and R2 of 0.4.



Conclusion

This work exhibits the efficacy of uncertainty estimation for HNSCC GTV-DL auto-segmentation. Using the estimated uncertainty, the uncertainty regions disclose potential erroneous regions of predictions. The feasibility of using uncertainty regions for patient-level failure detection was also demonstrated with a primitive approach. This study contributes substantially to the clinical applicability of DL-based GTV segmentation.