Explainability of deep learning-based HPV status prediction in oropharyngeal cancer
Agustina La Greca,
Switzerland
PD-0820
Abstract
Explainability of deep learning-based HPV status prediction in oropharyngeal cancer
Authors: Agustina La Greca1,2, Chiara Marchiori3, Marta Bogowicz1, Javier Barranco-García1, Ender Konukoglu4, Oliver Riesterer5,1, Panagiotis Balermpas1, Cristiano Malossi3, Matthias Guckenberger1, Janita E. van Timmeren1, Stephanie Tanadini-Lang1
1University Hospital Zurich, University of Zurich, Department of Radiation Oncology, Zurich, Switzerland; 2ETH Zurich, Department of Information Technology and Electrical Engineering, Computer Vision Laboratory , Zürich, Switzerland; 3IBM Research Zurich, AI Automation, Zurich, Switzerland; 4ETH Zurich, Department of Information Technology and Electrical Engineering, Computer Vision Laboratory, Zurich, Switzerland; 5Cantonal Hospital Aarau, Center for Radiation Oncology KSA-KSB, Aarau, Switzerland
Show Affiliations
Hide Affiliations
Purpose or Objective
Patients with human
papilloma virus (HPV)-positive oropharyngeal tumors are characterized by a more favorable prognosis
when compared to their negative counterparts and, thus, hold the potential for
treatment de-escalation. In clinical practice, HPV diagnosis requires the
analysis of biopsy samples, while medical image analysis tools have been
proposed in literature as complementary non-invasive methods. In this study, we
aimed to assess the diagnostic accuracy and explainability of deep learning
(DL) for HPV status prediction in computed tomography (CT) images of
oropharyngeal cancer (OPC) patients.
Material and Methods
One internal (n1=96)
and two public cohorts (n2=498; n3=146) of OPC patients were employed. The dataset was split in a stratified fashion
based on HPV status into training (60%), validation (20%) and test (20%) sets.
All CT scans were resampled to a cubic resolution of 2 mm3 and a sub-volume
of 96x96x96 pixels was cropped. In the axial direction, the sub-volume spanned
from the nasal columella to 96 pixels below, i.e., approximately the start of
the lungs. On the axial plane, the crop was centered around the center of mass
of the first cranial slice. ModelsGenesis, a publicly available 3D model pre-trained
on lung CT, was fine-tuned to perform the classification task. The model with
the highest F1-score on the validation set was selected and applied to the test
set. Class activation maps (CAMs) of those test subjects belonging to the
internal dataset (n=25) were obtained post-hoc by means of two explainability
methods, Grad-CAM and Score-CAM. CAMs were posteriorly thresholded
using the 70th and 90th percentile values to select the
most important regions (CAM70th and CAM90th) and their volumetric
overlap with the gross tumor volume (GTV) was calculated using Szymkiewicz–Simpson formula for the primary
tumor (GTVpt) and the affected lymph nodes (GTVln),
separately and together (GTVall).
Results
The model
achieved an AUC/accuracy/F1-score of 0.89/0.82/0.78, 0.83/0.77/0.70 and 0.87/0.79/0.74
on the training, validation, and test cohorts, respectively. Figure 1 shows the
visual explanation obtained after applying Grad-CAM for two test subjects. Among
the 25 internal test cases, 19 were correctly classified. An overlap between
GTVall and Grad-CAM70th of at least 0.8 was observed in
21 cases, while the same was true for 24 cases using Score-CAM70th.
The overlap coefficients of GTVall with Grad-CAM90th and
Score-CAM90th were at least 0.5 for 13 subjects. The mean overlap
coefficients of the GTVpt, GTVln and GTVall with the different CAMs are shown in Table 1.
Conclusion
Two
explainability methods were employed to explore which CT regions were the most
relevant in HPV status prediction by a 3D DL model. Our study showed a
promising classification performance and volumetric overlap between the
resulting heatmaps and the GTVpt and GTVln. These findings contribute to address reliability concerns of DL in
diagnostics and bring closer its application in a clinical setting.