Deep learning-based segmentation considering observer variation - evaluation in prostate MRI for BT
PD-0498
Abstract
Deep learning-based segmentation considering observer variation - evaluation in prostate MRI for BT
Authors: Arkadiy Dushatskiy1, Peter A. N. Bosman1,5, Karel A. Hinnen2, Jan Wiersma2, Henrike Westerveld3, Bradley Pieters2, Tanja Alderliesten4
1Centrum Wiskunde & Informatica, Evolutionary Intelligence, Amsterdam, The Netherlands; 2Amsterdam UMC, University of Amsterdam, Radiation Oncology, Amsterdam, The Netherlands; 3Erasmus Medical Center, Radiation Oncology, Rotterdam, The Netherlands; 4Leiden University Medical Center, Radiation Oncology, Leiden, The Netherlands; 5Delft University of Technology, Algorithmics, Delft, The Netherlands
Show Affiliations
Hide Affiliations
Purpose or Objective
Recently we proposed a novel deep learning-based method for (semi)automatic scan segmentation that can output multiple segmentations representing the observer variation in the training set. Here, our goal is to verify its potential for clinical practice integration by comparing the automatically produced segmentations to the clinically approved one and one produced by a classical deep learning method (CDLM). Specifically, we consider prostate segmentation on MRI scans acquired for brachytherapy.
Material and Methods
In contrast to CDLM, our method can capture and exploit observer variation inherently present in the data, e.g., produced segmentations might correspond to different observer groups. For clinical use, it means that a clinician can select the preferred segmentation among multiple automatically produced ones (here we use 2), potentially requiring less/no manual correction.
Our method uses a multi-head U-Net with ResNeXt-50 encoder. CDLM has the same neural network, but with one head. In our method, heads are trained on separate training data subsets obtained by an optimization algorithm. The dataset was split into 40/13/13 scans for train/validation/test (used for the evaluation study).
We used MRI scans, previously used for HDR prostate brachytherapy with catheters in situ. For each scan, four prostate segmentation variants were presented: 1) Clinically used segmentation (reference) 2) Segmentation produced by a CDLM 3-4) Two segmentations produced by our method. Segmentation variants were named to not reveal their origin, enabling an unbiased blinded study. For each scan, an experienced radiation oncologist was asked to grade individual slices, the whole volumetric prostate segmentation and, finally, rank the presented segmentations. The grade scale was from 1 to 4 meaning that a segmentation: 1) should be rejected; 2) requires major manual correction; 3) requires minor manual correction; 4) can be approved without correction. For our method, we use the best grade (or rank) for the two segmentation variants produced (per-slice or per-scan) as, at time of use, a clinician can choose the preferred one.
We test statistical differences between different segmentation methods using the chi-squared test and significance level p=0.05.
Results
Figures show main results and particular segmentation examples. In per-slice and per-scan grading, our method produces acceptable segmentations more often than CDLM. However, there is no statistically significant difference between our method and CDML and between our method and the reference segmentations. Our method produces segmentations that are on average better ranked (p-value=0.01) than both CDLM and the reference.
Conclusion
Deep-learning-based automatic segmentation can produce high-quality segmentations. Our method, which produces multiple segmentation variants instead of just one, was evaluated to give the best results, ranking better even than the reference segmentations.