Copenhagen, Denmark
Onsite/Online

ESTRO 2022

Session Item

Implementation of new technology and techniques
Poster (digital)
Physics
Comprehensive evaluation of ProtegeAI Prostate 2.0 auto-segmentation: time-gain and accuracy
Nicolas Jullian, Belgium
PO-1676

Abstract

Comprehensive evaluation of ProtegeAI Prostate 2.0 auto-segmentation: time-gain and accuracy
Authors:

Nicolas Jullian1, Zelda Paquier2,3, Manuela Burghelea2,3, Dirk Van Gestel1, Nick Reynaert2,3, Akos Gulyban2,3

1Institut Jules Bordet, Université Libre de Bruxelles (ULB), Radiation Oncology, Brussels, Belgium; 2Institut Jules Bordet, Medical Physics, Brussels, Belgium; 3Université Libre de Bruxelles (ULB), Radiophysics and MRI physics laboratory, Brussels, Belgium

Show Affiliations
Purpose or Objective

The aim of this study was to evaluate the time gain and accuracy of the MIM ProtegeAI 2.0 auto-segmentation solution (version 7.1.5, MIM software Inc, Cleveland OH, USA). A second objective was to assess intra-observer variability and familiarization bias when using auto-segmentation.

Material and Methods

Twenty-five patients with prostate cancer were included. For each case a planning CT scan (from vertebrae L1/2 to 3cm below the ischial tuberosity, 3mm slice thickness) was performed, followed by auto-segmentation using the ProtegeAI Prostate 2.0 model (AI) and manual delineation by a single observer (Manual). Femur_L/_R, PenileBulb, Rectum, SeminalVes, Bladder were evaluated; while another five AI-generated OARs did not match our institutional template, hence were not evaluated. Time of AI delineation scoring (AIscor: major/minor/no correction needed), AI correction (AIcor), total AI (=AIscor+AIcor) and manual delineation was measured. Time gain was also calculated per individual OAR. Half of the cohort started with AIscor and AIcor followed by Manual, while the other half started with Manual, followed by AIscor and AIcor. For both groups Manual and AIcor were compared separately to evaluate familiarization bias. For time-gain and bias evaluation t-test at p<0.05 significance level were used. Dice Similarity Coefficient (DSC), 95% Hausdorff (HD95) and median surface distance (MSD) were also determined for AI/AIcor, AI/Manual and AIcor/Manual comparisons. AIcor/Manual was used to define intra-observer variability as both contours were considered clinically acceptable.  

Results

A total of 235 contours were generated by AI (5 min per patient). For 20 patients, AI failed to generate Kidney_L/_R. Major, minor or no correction was considered in 14%, 72% and 14% of delineations, respectively. Manual took on average 12:25 (min:sec; range:8:21-21:59), AIscor and AIcor  1:55 (r: 1:21-3:32) and 6:18 (r:2:49-14:14), respectively (figure 1). AI gave up to 13:06 time gain, with an average of 4:12 (p<0.001), although for two patients AI took more time than Manual (3:05 and 2:08). Per OAR, the average time gain was 0:42 (r:-0.11-1:45). The familiarization bias, observed for Manual (p=0.029), was on average 2:25 faster when AI workflow started first, while for AIcor no significant bias was observed (p=0.168). Good DSC (>0.8) was observed for AI/AIcor, while HD95 and MSD (figure 2) showed larger discrepancy. For Femur (AI and AIcor) vs. Femural Head (Manual) agreement was moderate due to difference in intended delineation. Intra-observer (AIcor/Manual) variability was worse for DSC and better for HD95 and MSD compared to AI/AIcor.


Conclusion

ProtegeAI Prostate 2.0 auto-segmentation provides on average >4 minutes gain per patient while requiring only minor corrections. Realistic time gain is likely higher, as AIscor+AIcor prior manual delineation significantly reduced manual delineation time. Intraobserver variability remains a substantial source of differences, especially based on DSC.