We have included treatment planning CT, PET, and MRI (T1w mDixon and T2w) images for 567 HNC patients with a wide variety of tumor sites (larynx, pharynx, oral cavity, sinonasal, and salivary gland carcinomas). GTV-T and GTV-N clinical delineations were treated as two separate DL targets. The data was randomly split into training(n=375), validation(n=95), and test sets(n=97).
To simulate user input clicks, we generated a dot of random size between 5 and 10 mm³ at a random location inside each distinct target volume. We used this simulated user feed in conjunction with CT, PET, T1w, and T2w MRI scans as inputs to a 3D UNet. We compared the segmentation results to the UNet using only the scans as input.
We evaluated the detection ratio(%) on all the distinct GTV-Ts and GTV-Ns. The segmentation performance was evaluated using Dice Similarity Coefficient(Dice), Hausdorff distance 95%(HD95), mean surface distance(MSD), and Surface-Dice with 2mm tolerance. The voxel-based false discovery rate (FDR) and false negative rate (FNR) were used to measure false segmentation and compared using a Wilcoxon signed-rank test(p<0.05). FDR can be interpreted as an indicator of false positive segmentations, whereas FNR indicates false negative segmentations. For all metrics, the mean and 95% confidence interval (CI95, bootstrapping 10000 samples) were reported.