Early-fusion hybrid CNN-transformer models for multiclass ovarian tumor ultrasound classification
Author/s
Garcia Atutxa, Igor; Martínez Más, José; Bueno Crespo, Andrés; Villanueva Flores, FranciscaDate
2025-10-15Discipline/s
Ingeniería, Industria y ConstrucciónSubject/s
Ovarian cancerUltrasound imaging
Deep learning
CNN
Vision transformer
Hybrid model
Early diagnosis
Abstract
Ovarian cancer remains the deadliest gynecologic malignancy, and transvaginal ultrasound (TVS), the first-line test, still suffers from limited specificity and operator dependence. We introduce a learned early-fusion (joint projection) hybrid that couples EfficientNet-B7 (local descriptors) with a Swin Transformer (hierarchical
global context) to classify eight ovarian tumor categories from 2D TVS. Using the public, de-identified OTU-2D dataset (n = 1,469 images across eight histopathologic classes), we conducted patient-level, stratified 5-fold cross-validation repeated 10×. To address class imbalance while preventing leakage, training used train-only
oversampling, ultrasound-aware augmentations, and strong regularization; validation/test folds were never resampled. The hybrid achieved AUC 0.9904, accuracy 92.13%, sensitivity 92.38%, and specificity 98.90%, outperforming single CNN or ViT baselines. A soft ensemble of the top hybrids further improved performance
to AUC 0.991, accur...





