ESTRO 2023

Session Item

Radiomics, modelling and statistical methods

Session Type: Poster (Digital)

Track: Physics

Journey:

Machine learning and image-oriented methods for head and neck cancer treatment outcome prediction

Bao Ngoc Huynh, Norway

Presentation Number: PO-2114

Abstract

Abstract Title:

Machine learning and image-oriented methods for head and neck cancer treatment outcome prediction

Authors:

Bao Ngoc Huynh¹, Aurora Rosvoll Groendahl¹, Oliver Tomic¹, Ingerid Skjei Knudtsen^2,3, Frank Hoebers⁴, Wouter van Elmpt⁴, Eirik Malinen^3,5, Einar Dale⁶, Cecilia Marie Futsaether¹

¹Norwegian University of Life Sciences, Faculty of Science and Technology, Ås, Norway; ²Norwegian University of Science and Technology, Department of Circulation and Medical Imaging, Trondheim, Norway; ³Oslo University Hospital, Department of Medical Physics, Oslo, Norway; ⁴Maastricht University Medical Center, Department of Radiation Oncology (MAASTRO), Maastricht, The Netherlands; ⁵University of Oslo, Department of Physics, Oslo, Norway; ⁶Oslo University Hospital, Department of Oncology, Oslo, Norway

Show Affiliations

Purpose or Objective

Different machine learning (ML) methods, including deep learning (DL) with image-oriented methods such as convolutional neural networks (CNN), were used to predict disease free survival (DFS) and overall survival (OS) in two cohorts of head and neck cancer (HNC) patients.

Material and Methods

HNC patients from two centers, Oslo University Hospital (OUS, N=139) and Maastricht University Medical Center (MAASTRO, N=99) with 18F-FDG PET/CT images acquired before radiotherapy, were included. Two types of input data were analyzed: (D1) clinical factors and (D2) PET/CT images with delineated primary tumors (GTVp) and affected lymph nodes (GTVn). The prediction targets DFS and OS were treated as binary responses, in which class 1 indicated an event.

Seven models (M1-M7) with increasing complexity levels were trained and validated on the OUS dataset using nested 5-fold cross-validation. The external MAASTRO dataset was used for testing the models on previously unseen data. Five performance metrics were computed: (I) Accuracy, (II) Area Under the Receiver Operating Characteristic Curve (AUC), (III) Matthews correlation coefficient (MCC), F1 score on class 1 (IV) and class 0 (V) separately. As the event ratios were different between the two datasets (DFS: 49% (OUS), 41% (MAASTRO); OS: 60% (OUS), 54% (MAASTRO)), all metrics were calculated from 1000 bootstrap samples from each dataset, using a 1:1 ratio between the two classes.

Prediction models based on clinical factors only (D1) were constructed using the conventional ML methods logistic regression (M1) and random forest (M2), as well as two DL approaches: one using a simple neural network (M3) and the other using a neural network with interactions between network nodes (M4) to learn possible feature interactions within the data.

A downscaled 3D version of the EfficientNet CNN was used to derive patterns or possibly radiomics features from 3D image input (D2). With this CNN, three outcome prediction models using the following different groups of image input were evaluated: CT and PET (M5); CT, PET and GTVp (M6); CT, PET, GTVp and GTVn (M7).

Results

For DFS (Figure 1), models based on clinical input data (M1-M4) had the overall poorest external validation results due to their considerably low MCC and class 0 F1 scores, reflecting a high number of false positive predictions for the MAASTRO dataset. The CNNs trained on CT and PET images, with and without the GTVp (M5&M6), obtained the highest performance metrics.

For OS (Figure 2), only CNN models (M5-M7) maintained their performance under external validation while other models (M1-M4) overfitted to the OUS dataset as indicated by the decrease in performance for the MAASTRO set. Model M6, trained on CT, PET and GTVp, obtained the best performance in almost all metrics.

Conclusion

CNN models based on CT and PET images without inclusion of clinical data can achieve better and more generalized performance in a multi-center setting when predicting DFS and OS than models based solely on clinical information.