Publication:
Detection and classification of breast cancer calcifications using machine learning with augmentation technique

No Thumbnail Available
Date
2025-07
Authors
Borhanuddin, Nurul Syuhaida
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Abstract
Breast cancer calcifications are among the earliest indicators of malignancy but are often difficult to detect due to their subtle appearance and reliance on subjective radiological interpretation. This study aimed to enhance the detection and classification of breast calcifications in mammographic images through the integration of image preprocessing and augmentation techniques with machine learning. A total of 234 annotated mammographic images were collected from the Picture Archiving and Communication System at Hospital Pakar Universiti Sains Malaysia. These images were augmented using various transformations including rotation, flipping, Gaussian blur, and elastic deformation, resulting in a total dataset of 2574 images to improve variability and reduce the risk of overfitting. Preprocessing techniques such as grayscale conversion, contrast enhancement using CLAHE, and top hat filtering were applied to improve the visibility of calcification features. Five machine learning models were evaluated including Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Logistic Regression (LR), and a Soft Voting ensemble model. Model performance was measured using accuracy, precision, recall, specificity, and F1 score. Validation was performed using 5-fold cross validation and statistical significance was tested with the Friedman test and Wilcoxon Signed Rank test. Based on the results, the KNN model achieved the highest average accuracy of 87.61% followed by SVM at 79.07%, RF at 78.64% and LR at 69.62%. The findings suggest that the KNN model was particularly effective at distinguishing between benign and malignant calcifications due to its sensitivity to local feature patterns. Although Logistic Regression had the shortest training time, it performed the poorest in all evaluation metrics, indicating that training speed alone is not a sufficient measure of diagnostic utility. The results also highlight that proper augmentation and preprocessing not only improve model accuracy but also contribute to more balanced performance across sensitivity and specificity. The use of statistical validation confirmed that differences among the model performances were significant, thus reinforcing the reliability of the findings. This study demonstrates that machine learning models when supported by proper data preparation strategies, can serve as effective tools in the development of computer assisted diagnostic systems for early breast cancer detection
Description
Keywords
Breast cancer , calcification
Citation