ESTRO 2023

Session Item

Urology

Session Type: Poster (Digital)

Track: Clinical

Journey:

Comparison of machine learning methods to predict urinary incontinence in localized prostate cancer

Hajar Hasannejadasl, The Netherlands

Presentation Number: PO-1523

Abstract

Abstract Title:

Comparison of machine learning methods to predict urinary incontinence in localized prostate cancer

Authors:

Hajar Hasannejadasl¹, Henk van de Poel², Ben Vanneste³, Joep van Roermund⁴, Katja Aben^5,6, Zhen Zheng¹, Biche Osong¹, Lambertus Kiemeney⁵, Inge Van Oort⁷, Renee Verwey⁸, Laura Hochstenbach⁸, Esther Bloemen⁹, Andre Dekker¹, Rianne Fijten¹

¹Department of Radiation Oncology (Maastro), Maastricht University Medical Centre+, Maastricht, The Netherlands; ²Amsterdam University Medical Centers, Department of Urology, Amsterdam, The Netherlands; ³Department of Radiation Oncology (Maastro), Maastricht University Medical Centre+, Maastricht, The Netherlands; ⁴Maastricht University Medical Center+, Department of Urology , Maastricht, The Netherlands; ⁵Netherlands Comprehensive Cancer Organization, Department of Research & Development, Utrecht, The Netherlands; ⁶Radboud university medical centre, Radboud Institute for Health Sciences, Nijmegen, The Netherlands; ⁷Department of Urology, Radboud university Medical Center, Nijmegen, The Netherlands; ⁸Zuyd University , Departmenet of Applied Sciences, Heerlen, The Netherlands; ⁹Zuyd University , Department of Applied Sciences, Heerlen, The Netherlands

Show Affiliations

Purpose or Objective

Urinary incontinence (UI) is one of the most common side effects of prostate cancer treatment, but it is currently difficult to predict in clinical practice without artificial intelligence models. Finding a balance between explainability and predictability of a clinical predictive model is a prerequisite for its adoption, but some black box models are considered to perform better in terms of predictability despite not being as explainable. To determine which algorithm has the highest accuracy and is also easily explainable, we used three machine learning (ML) algorithms: logistic regression (LR), random forests (RF), and support vector machines (SVM). To identify the best algorithm to predict UI following localized prostate cancer treatment, we compared the performance of the generated models.

Material and Methods

For our analyses, we used the ProZIB dataset for this study, which included demographics, clinical data, and patient-reported outcomes (PROMs) from 69 Dutch hospitals collected by the Netherlands Comprehensive Cancer Organization. This dataset contained information of 964 men with localized prostate cancer for the purpose of training and external validation. In order to perform an external validation in accordance with the TRIPOD Type 3 guidelines, data were split by location so that one hospital's data could be used either for training or validation. Six models were generated for 2 time points; 3 models for UI 1 year after treatment and 3 for UI 2 years after treatment.

Results

Analyses were conducted on 847 and 670 localized prostate cancers for 1- and 2-year models, respectively. The performance of LR in external validation was superior to other models with an accuracy of 0.76, a sensitivity of 0.82, and an AUC of 0.79 for the 1-year outcome. Training and validation sets of all 2-year models, however, showed markedly different performances. The 2-year models’ accuracy varies from 0.60 (for LR and SVM) to 0.65 (for RF), and both sensitivity and specificity were considerably different for all models. Figure 1 shows the performance results of generated models. The importance of features in each ML model for predicting UI is shown in Figure 2. The importance of features varied among different ML models where 4 variables were selected by all models for 1-year and two for the 2-year outcome. Substantial overlap was observed between variables selected by RF and SVM algorithms.

Figure 1. Performance results of 1-year and 2-year models

Figure 2. A comparison of selected variables in different models

Conclusion

The 2-year models failed to achieve satisfactory results, indicating that the models are not reproducible regardless of the algorithm used. For the 1-year outcome, the model based on LR, known as an explainable algorithm outperformed RF and SVM in external validation. Our findings demonstrate that a non-black box prediction model can still offer high performance to both patients and care providers.