Copenhagen, Denmark
Onsite/Online

ESTRO 2022

Session Item

Monday
May 09
14:15 - 15:15
Mini-Oral Theatre 2
22: AI, big data, automation
Eugenia Vlaskou Badra, Switzerland;
Stephanie Tanadini-Lang, Switzerland
3400
Mini-Oral
Interdisciplinary
Automatic delineation of head and neck gross tumor volume using multimodal information
Heleen Bollen, Belgium
MO-0886

Abstract

Automatic delineation of head and neck gross tumor volume using multimodal information
Authors:

Heleen Bollen1, Sandra Nuyts1, Siri Willems2, Frederik Maes2

1KU Leuven, Laboratory of Experimental Radiotherapy, Leuven, Belgium; 2KU Leuven, Processing Speech and Images (PSI), Leuven, Belgium

Show Affiliations
Purpose or Objective

Accurate radiotherapy treatment (RT) of head and neck cancer (HNC) requires precise delineation of target volumes (TV). Delineation is performed manually using several image modalities, e.g. CT and PET. Since delineation is highly experience and perception dependent, there’s growing interest in automation of the delineation process. Literature for automation in head and neck cancer is limited to the performance of unimodal networks. The goal of our research was to create a 3D convolutional neural network (CNN) that uses the information from multiple modalities to improve the overall performance of the segmentation compared to unimodal approaches.  

Material and Methods

The dataset consists of 70 patients with oropharyngeal cancer. For each patient, planning CT image (pCT), PET imaging and manual delineation of the primary (GTVp) and nodal gross tumor volume (GTVn), acquired by two radiation oncologists, were available. PET was rigidly registered to the planning CT image using Eclipse (Varian medical systems, Palo Alto, CA). A 3D CNN  was developed with two separate input pathways, one for each modality, such that each pathway may focus on learning patterns for that specific modality. At certain points in the model, a connecting layer is implemented to transfer information between both pathways. At the end of the model, the pathways are concatenated, and a final classifier layer uses the received info to predict the final segmentation label. The performance of this approach was compared to unimodal approaches (pCT model and a PET model) using the Dice similarity coefficient (DSC), the mean surface distance (MSD) and the 95% Hausdorff distance (HD95)

Results

The multimodal approach performs best for all metrics for both the GTVp and GTVn, as shown in Table 1. The DSC improves from 48.0% (for pCT model) and 48.9% (for PET model) to 59.1% (for pCT+PET model) for the GTVp while the GTVn reaches an average DSC of 62.8%. Adding PET information reduced the small false positive spots in the delineation result compared to pCT model and PET model. A reduction of the absolute volume difference was seen for both GTVp and GTVn, as shown in Figure 1.
Table 1 5-fold cross validation results for the pCT model, PET model and the multimodal approach
Figure 1 absolute volume differences in ml between manual and automatic delineation for the pCT model, PET model and the multimodal approach, with GTVp in purple and GTVn in green.

Conclusion

Adding functional PET information improves the overall segmentation result, compared to a unimodal network based on pCT input only. Automation of segmentation in HNC offers the possibility of implementing more advanced RT techniques, e.g. adaptive RT and proton therapy. However, performance of existing unimodal networks has been insufficient for clinical implementation. The introduction of multimodality networks could identify a solution for automated delineation of TVs in HNC. We foresee the addition of MRI imaging to the multimodal CNN by the start of ESTRO conference.