Synthetic Paired Data Generation for Medical Imaging: Bridging the Gap Toward Faithfully Reproducing Patient-Dependent Conditional Structure
The performance of supervised learning in digital medical imaging modalities such as ultrasound and low-dose CBCT depends critically on the availability of paired datasets. These datasets must capture variability across patients, anatomical structures, and disease presentations, while providing accurate and consistent labels aligned with the measured images. Diagnostic tasks—including segmentation and detection—are particularly dependent on such paired data, requiring reliable annotations such as lesion localization, bounding regions, and clinically meaningful diagnostic labels. Consequently, robust model training requires large-scale datasets with high-quality annotations spanning diverse patient populations. However, in real clinical settings, such high-quality paired datasets are often unavailable due to the limited representation of abnormal cases, the absence of ground truth, inter-observer variability in annotations, patient-specific image heterogeneity, and the inherent variabil...