ESTRO 2022

Session Item

Monday

May 09

10:30 - 11:30

Poster Station 2

20: Head and neck

Chair: Annett Linge, Germany

Overview: Poster Discussions are presented at one of the sessions scheduled at the two poster terminals in the exhibition. Each author will present a digital poster orally for 2 minutes, followed by 2 minutes for discussion. Sessions will not be streamed but authors are invited to upload per-recordings for the online platform.

Session Type: Poster Discussion

Track: Clinical

Journey:

Explainability of deep learning-based HPV status prediction in oropharyngeal cancer

Agustina La Greca, Switzerland

Presentation Number: PD-0820

Abstract

Abstract Title:

Explainability of deep learning-based HPV status prediction in oropharyngeal cancer

Authors:

Agustina La Greca^1,2, Chiara Marchiori³, Marta Bogowicz¹, Javier Barranco-García¹, Ender Konukoglu⁴, Oliver Riesterer^5,1, Panagiotis Balermpas¹, Cristiano Malossi³, Matthias Guckenberger¹, Janita E. van Timmeren¹, Stephanie Tanadini-Lang¹

¹University Hospital Zurich, University of Zurich, Department of Radiation Oncology, Zurich, Switzerland; ²ETH Zurich, Department of Information Technology and Electrical Engineering, Computer Vision Laboratory , Zürich, Switzerland; ³IBM Research Zurich, AI Automation, Zurich, Switzerland; ⁴ETH Zurich, Department of Information Technology and Electrical Engineering, Computer Vision Laboratory, Zurich, Switzerland; ⁵Cantonal Hospital Aarau, Center for Radiation Oncology KSA-KSB, Aarau, Switzerland

Show Affiliations

Purpose or Objective

Patients with human papilloma virus (HPV)-positive oropharyngeal tumors are characterized by a more favorable prognosis when compared to their negative counterparts and, thus, hold the potential for treatment de-escalation. In clinical practice, HPV diagnosis requires the analysis of biopsy samples, while medical image analysis tools have been proposed in literature as complementary non-invasive methods. In this study, we aimed to assess the diagnostic accuracy and explainability of deep learning (DL) for HPV status prediction in computed tomography (CT) images of oropharyngeal cancer (OPC) patients.

Material and Methods

One internal (n₁=96) and two public cohorts (n₂=498; n₃=146) of OPC patients were employed. The dataset was split in a stratified fashion based on HPV status into training (60%), validation (20%) and test (20%) sets. All CT scans were resampled to a cubic resolution of 2 mm³ and a sub-volume of 96x96x96 pixels was cropped. In the axial direction, the sub-volume spanned from the nasal columella to 96 pixels below, i.e., approximately the start of the lungs. On the axial plane, the crop was centered around the center of mass of the first cranial slice. ModelsGenesis, a publicly available 3D model pre-trained on lung CT, was fine-tuned to perform the classification task. The model with the highest F1-score on the validation set was selected and applied to the test set. Class activation maps (CAMs) of those test subjects belonging to the internal dataset (n=25) were obtained post-hoc by means of two explainability methods, Grad-CAM and Score-CAM. CAMs were posteriorly thresholded using the 70^th and 90^th percentile values to select the most important regions (CAM^70th and CAM^90th) and their volumetric overlap with the gross tumor volume (GTV) was calculated using Szymkiewicz–Simpson formula for the primary tumor (GTV_pt) and the affected lymph nodes (GTV_ln), separately and together (GTV_all).

Results

The model achieved an AUC/accuracy/F1-score of 0.89/0.82/0.78, 0.83/0.77/0.70 and 0.87/0.79/0.74 on the training, validation, and test cohorts, respectively. Figure 1 shows the visual explanation obtained after applying Grad-CAM for two test subjects. Among the 25 internal test cases, 19 were correctly classified. An overlap between GTV_all and Grad-CAM^70th of at least 0.8 was observed in 21 cases, while the same was true for 24 cases using Score-CAM^70th. The overlap coefficients of GTV_all with Grad-CAM^90th and Score-CAM^90th were at least 0.5 for 13 subjects. The mean overlap coefficients of the GTV_pt, GTV_ln and GTV_all with the different CAMs are shown in Table 1.

Conclusion

Two explainability methods were employed to explore which CT regions were the most relevant in HPV status prediction by a 3D DL model. Our study showed a promising classification performance and volumetric overlap between the resulting heatmaps and the GTV_pt and GTV_ln. These findings contribute to address reliability concerns of DL in diagnostics and bring closer its application in a clinical setting.