Clinical evaluation of autosegmentation using AI with manual segmentation of breast tissue
Remus-Cosmin Stoica,
Romania
PO-1641
Abstract
Clinical evaluation of autosegmentation using AI with manual segmentation of breast tissue
Authors: Remus-Cosmin Stoica1, Cristina Pop-Casandra2, Razvan George Curca3, Adrian Marian Radu4, Bogdan Chivu4, Stefanel Cornel Vlad3, Beatrice Anghel1, Tiberiu Popescu5, Dragos Grama6, Marius Stanescu6, Lucian Bicsi6, Dragos Dușe6
1Sanador, Radiation Oncology, Bucharest, Romania; 2“Prof. Dr. Ion Chiricuta” Institute of Oncology, Radiation Oncology, Cluj-Napoca, Romania; 3Neolife Medical Center, Radiation Oncology, Bucharest, Romania; 4“Prof. Dr. Alexandru Trestioreanu” Institute of Oncology, Radiation Oncology, Bucharest, Romania; 5Amethyst Radiotherapy Group, Radiation Oncology, Cluj-Napoca, Romania; 6Synaptiq Tehnologies, AI Research, Cluj-Napoca, Romania
Show Affiliations
Hide Affiliations
Purpose or Objective
In radiotherapy large inter-observer variability has been proven to influence the delineation of target volumes and near organs at risk (OARs) in breast cancer treatment preparation. This study evaluates and compares the quality of the breast contours of a deep learning Artificial Intelligence (AI) network, trained on a curated dataset of breasts volumes in breast cancer patients using “ESTRO consensus guideline on target volume delineation for elective radiation therapy of early-stage breast cancer”.
Material and Methods
In this comparison 10 patients DICOM-RT datasets of breast cancer patients are used. The patients were initially treated for early-stage breast cancer in a local chain of radiotherapy clinics. The manual contouring is done by 5 referring Radiation Oncologists respecting ESTRO consensus guidelines and used as a reference, called next the Gold Standard (GS). The contours generated by the AI are then corrected by the same 5 ROs (AI-corrected — AI-c).
We perform automatic segmentation using deep learning algorithms trained on a small database (37 left and 41 right breast delineations). We analyze the variability between the Gold Standard and the AI-corrected contours quantitatively, by computing three indexes: Dice Similarity Coefficient (DSC), 95 Hausdorff Distance (95 HD), and Mean Distance to Conformity (MDC). Finally, we conduct an A/B experiment with mixed GS and AI-corrected breast contours, and 3 expert ROs are asked to grade them from 1 to 3 (1 – acceptable, 2 – acceptable after minor corrections, 3 – acceptable after major corrections). The experiment gives us a qualitative perspective of the differences between manual and AI-corrected contouring procedures.
Results
Our quantitative analysis shows a difference in the similarity indexes in the results obtained which demonstrates a statistically significant variability between the GS and the AI-corrected contours (mean DSC 0.95, mean 95HD 2.54, and mean MDC 0.66).
In the independent grading system done by the 3 RO experts: grade 1 was designated to 66.67% of the GS and 78.33% of the AI-c; Grade 2 was accorded to 33.33% of the GS and 21.67% of the AI-c. No major corrections (Grade 3) were registered.
Conclusion
The mix between AI algorithms and ROs can successfully generate good and superior quality delineations (the AI prediction can properly generate the lateral and medial border of the breast tissue). Differences are primarily found in the cranio-caudal direction between the data sets. Retraining AI algorithms on standardized reference datasets have the potential to further enhance performance, increasing the usage of AI algorithms for clinical investigations and research.