Deep Learning-based Multi-class Classification of Breast Cancer Pathology : A Comparative Study on Full-field Digital Mammograms Using the CBIS-DDSM Dataset
Yu, Qiong (2025)
Yu, Qiong
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025112529541
https://urn.fi/URN:NBN:fi:amk-2025112529541
Tiivistelmä
The purpose of this thesis was to develop and evaluate a deep learning–based approach for the automatic classification of breast cancer pathology using full-field digital mammograms from the publicly available CBIS-DDSM dataset. The objective was to explore the performance and generalizability of different convolutional neural network (CNN) architectures in a three-class classification task consisting of benign, malignant, and benign without callback categories.
The study was conducted as an experimental research project using Python and the PyTorch framework. A total of 2,785 mammogram images were curated, preprocessed, and divided into patient-exclusive training and validation folds. Multiple CNN architectures, including ResNet18, ResNet34, ResNet50, and EfficientNet-b0/b1, were implemented and optimized through systematic adjustment of hyperparameters such as learning rate, batch size, and regularization. Model performance was evaluated using five-fold cross-validation and assessed by balanced accuracy, F1-score, average precision, ROC-AUC metrics and confusion matrix.
The results indicated that progressively deeper and more efficient CNN architectures improved diagnostic performance. The best-performing model, EfficientNet-b0, achieved a mean balanced accuracy of 68.3% and a macro-AUC of 79.9%, representing an 11% improvement over the baseline ResNet18. The model demonstrated stable performance across folds and effective recognition of minority classes, particularly benign without callback cases.
The conclusions of this study suggest that optimized CNN architectures can provide reliable assistance in mammography-based breast cancer screening by reducing diagnostic workload and improving classification consistency. Future work should focus on expanding datasets, integrating attention mechanisms, or explainable AI frameworks to enhance model interpretability and clinical applicability.
The study was conducted as an experimental research project using Python and the PyTorch framework. A total of 2,785 mammogram images were curated, preprocessed, and divided into patient-exclusive training and validation folds. Multiple CNN architectures, including ResNet18, ResNet34, ResNet50, and EfficientNet-b0/b1, were implemented and optimized through systematic adjustment of hyperparameters such as learning rate, batch size, and regularization. Model performance was evaluated using five-fold cross-validation and assessed by balanced accuracy, F1-score, average precision, ROC-AUC metrics and confusion matrix.
The results indicated that progressively deeper and more efficient CNN architectures improved diagnostic performance. The best-performing model, EfficientNet-b0, achieved a mean balanced accuracy of 68.3% and a macro-AUC of 79.9%, representing an 11% improvement over the baseline ResNet18. The model demonstrated stable performance across folds and effective recognition of minority classes, particularly benign without callback cases.
The conclusions of this study suggest that optimized CNN architectures can provide reliable assistance in mammography-based breast cancer screening by reducing diagnostic workload and improving classification consistency. Future work should focus on expanding datasets, integrating attention mechanisms, or explainable AI frameworks to enhance model interpretability and clinical applicability.
