1 Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh.
2 Department of Management Information System, International American University, CA 90010, USA.
3 Department of Business Administration, International American University, CA 90010, USA.
4 Department of Engineering/Industrial Management, Westcliff University, Irvine, CA 92614, USA.
International Journal of Science and Research Archive, 2025, 15(01), 1811-1822
Article DOI: 10.30574/ijsra.2025.15.1.1162
Received on 13 March 2025; revised on 22 April 2025; accepted on 24 April 2025
Monkeypox is a zoonotic disease that poses diagnostic challenges due to its resemblance to other pox-type skin lesions like measles and chickenpox. Traditional deep learning (DL) methods, especially convolutional neural networks (CNNs), often struggle with generalization when trained on small, imbalanced datasets. These methods also tend to lack interpretability and computational efficiency, limiting their use in real-time, resource-constrained settings. This study introduces a lightweight, explainable DL framework based on EfficientFormerV2, which merges the advantages of convolutional inductive biases with efficient token-mixing strategies. We used the publicly available Monkeypox Skin Image Dataset (MSID), which contains 770 images across four categories: Monkeypox, Chickenpox, Measles, and Normal. Through advanced preprocessing and augmentation, we expanded the dataset to 4,000 images, improving class representation and reducing overfitting. Also, we evaluated five models—EfficientFormerV2, T2T-ViT, DeiT, Xception, and MobileNetV4—using metrics like F1-score, specificity, PR AUC, and Matthews Correlation Coefficient (MCC) with 10-fold stratified cross-validation. EfficientFormerV2 performed the best, achieving an F1-score of 98.73%, specificity of 99.63%, PR AUC of 99.86%, and MCC of 94.15%. We used Grad-CAM visualizations to create class-specific heatmaps for better interpretability. This framework combines an efficient architecture, data-centric augmentation, and explainable AI (XAI), offering high accuracy and low-latency predictions, making it suitable for real-time monkeypox screening, especially in low-resource settings.
Skin Lesion; Vision Transformer; Hybrid Deep Learning; Explainable AI(XAI); Monkeypox
Preview Article PDF
Sanjida Akter, Mohammad Rasel Mahmud, Md Ariful Islam, Md Ismail Hossain Siddiqui and Anamul Haque Sakib. Efficient and interpretable monkeypox detection using vision transformers with explainable visualizations. International Journal of Science and Research Archive, 2025, 15(01), 1811-1822. Article DOI: https://doi.org/10.30574/ijsra.2025.15.1.1162.
Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0







