Systematic input evaluation for deep learning-based pre-treatment quality assurance
MO-0548
Abstract
Systematic input evaluation for deep learning-based pre-treatment quality assurance
Authors: Cecile Wolfs1, Frank Verhaegen1
1GROW - School for Oncology, Maastricht University Medical Center+, Radiation Oncology (Maastro), Maastricht, The Netherlands
Show Affiliations
Hide Affiliations
Purpose or Objective
In
pre-treatment quality assurance (QA) with electronic portal imaging device
(EPID) dosimetry, gamma analysis with standard criteria and thresholds on gamma
pass rates are commonly used for dose comparison and error detection. However, studies
show that deep learning (DL) methods provide higher sensitivity for detecting
errors, because full dose comparison images can be used as input and error
causes can be identified [1-3]. While gamma analysis is the traditional dose
comparison method of choice, other comparison methods (e.g. dose difference
maps) could further improve error detection when using DL. Moreover, image
preprocessing steps, such as normalization and image resizing, are known to
influence DL model performance. The objective of this work is to systematically
evaluate the impact of different dose comparison and image preprocessing
methods on the performance of a DL model for error identification in
pre-treatment QA.
Material and Methods
For
53 VMAT treatment plans of 46 lung cancer patients, mechanical errors were
simulated (MLC leaf positions, monitor unit scaling, collimator rotation). Two DL
classification levels were assessed: error type (Level 1), and error magnitude
(Level 2). Portal dose images were predicted using treatment plans with and
without errors, and subsequently compared using the dose comparison methods
listed in Table 1. Preprocessing consisted of cropping the dose comparison
images by applying a 10% low dose threshold, normalizing the pixel values (min/max
or mean/stdev; Table 1) and resizing to a square image size (Table 1). Making
all possible combinations of classification level, dose comparison, normalization
method and image size led to 144 input datasets. A DL network architecture
consisting of blocks of 2 convolutional layers and a max pooling layer,
followed by dense layers was used. The exact network (e.g. number of
convolutional blocks) and hyperparameters (e.g. learning rate) were optimized
for each input set.
Results
Figure
1 shows that using relatively simple dose comparison methods such as ratio
analysis or relative dose differences provides highest DL model performance,
although gamma analysis with strict criteria (particularly in the
distance-to-agreement) also performs well. Mean/stdev normalization
particularly improves Level 2 classification. Higher image resolution improves
error identification, as more details of the dose comparison images are
preserved.
Conclusion
The choice
of dose comparison method has the largest impact on error identification for
pre-treatment QA using DL, compared to image preprocessing. Model performance
can improve by applying mean/stdev normalization and high image resolution, but
the latter needs more computational resources and longer training times. While this
is not a major issue for 2D images, it may be for 2D images per treatment
segment or for 3D reconstructed dose volumes.
1. Nyflot et al. 2019 Med Phys 46: 456-464
2. Potter et al. 2020 Med Phys 47: 4711-4720
3. Kimura et al. 2021 Med Phys 48: 4769-4783