ESTRO 2022

Session Item

Radiomics, modelling and statistical methods

Session Type: Poster (digital)

Track: Physics

Journey:

Regularized distributed Cox regression: a model for federated feature selection in survival analysis

BENEDETTA GOTTARDELLI, Italy

Presentation Number: PO-1768

Abstract

Abstract Title:

Regularized distributed Cox regression: a model for federated feature selection in survival analysis

Authors:

Benedetta Gottardelli¹, Carlotta Masciocchi², Antonella Martino³, Luca Boldrini³, Ciro Mazzarella³, Giulio Grassi⁴, Mariangela Massaccesi³, Vincenzo Valentini³, Andrea Damiani²

¹Università Cattolica del Sacro Cuore, Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Rome, Italy; ²Fondazione Policlinico Universitario Agostino Gemelli, Gemelli Generator Real World Data, Rome, Italy; ³Fondazione Policlinico Universitario Agostino Gemelli, Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Rome, Italy; ⁴King’s College London, Computer Science, Rome, Italy

Show Affiliations

Purpose or Objective

Radiomic studies typically involve a large number of features that makes it difficult for a single institution or even multiple centres in a distributed learning setting to meet the adequate number of patients for statistically significant outputs for survival analysis. In these cases, feature selection methods can be very helpful to reduce the study variables to a smaller group, including only the most outcome-relevant ones, thus loosening the constraint on the cohort size. The aim of this work is to develop and validate, both on simulated and on real world data, a new algorithm performing feature selection for Cox Proportional Hazard survival analysis in a privacy-preserving setting through distributed learning.

Material and Methods

Lasso regularization was chosen as feature selection method due to its capability of gradually reducing, during model training, the absolute values of the coefficients of less significant covariates, setting to 0 the coefficients of non-significant or redundant features. The algorithm for the distributed Cox regression with Lasso regularization was implemented in Python 3.7; we also designed and developed an automated testing & validation platform to generate survival data, simulate the federated learning and compare the result of the new algorithm with those of any given ground truth solution. We assessed the algorithm performances calculating the mean absolute error (MAE) between its regression coefficients and those obtained from Python centralised state-of-the-art algorithm from the “sksurv” library.

The first step of the validation process involved testing the algorithm on several simulated survival datasets with a number of covariates ranging from 10 to 100 varying the regularization parameter (α) from 0.1 to 0.7. Secondly, we used for further validation a real dataset of 20 radiomic features extracted from RT-planning CT scans of patients affected by Lung Cancer using Overall Survival as model outcome.

Results

We report, in Table 1, 17 tests, 15 of which were done on simulated datasets and two on real world data. For the simulated data, MAE was overall lower than < 0.01. For the real world data testing, the algorithm was trained on 189 lung-cancer patients with two different regularization parameters. Real world data had coefficients in the distributed model, for both values of α, slightly more different than the centralised ground truth resulting in a larger MAE compared to the simulated data, but overall lower than 0.03. As expected, for both types of data, the number of excluded variables (0-coefficients) increases together with α.

Conclusion

To the best to our knowledge, this is the first implementation of a Cox Proportional Hazard model with Lasso features selection for federated learning. In the near future, the model will be tested in a non-simulated distributed setting on a real multi-institution retrospective study.