LEVit-Skin: A balanced and interpretable transformer-CNN model for multi-class skin cancer diagnosis

Anamul Haque Sakib; Md Ismail Hossain Siddiqui; Sanjida Akter; Abdullah Al Sakib; Mohammad Rasel Mahmud

doi:10.30574/ijsra.2025.15.1.1166

Anamul Haque Sakib ¹, Md Ismail Hossain Siddiqui ², Sanjida Akter ³, Abdullah Al Sakib ^{4, *} and Mohammad Rasel Mahmud ⁵

¹ Department of Business Administration, International American University, Los Angeles, CA 90010, USA.

² Department of Engineering/Industrial Management, Westcliff University, Irvine, CA 92614, USA.

³ Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh.

⁴ Department of Information Technology, Westcliff University, Irvine, CA 92614, USA.

⁵ Department of Management Information System, International American University, CA 90010, USA.

Research Article

International Journal of Science and Research Archive, 2025, 15(01), 1860-1873

Article DOI: 10.30574/ijsra.2025.15.1.1166

DOI url: https://doi.org/10.30574/ijsra.2025.15.1.1166

Publication history

Received on 13 March 2025; revised on 22 April 2025; accepted on 24 April 2025

Abstract

Skin cancer is a major cause of death, making early detection essential. This study presents LEVit, an explainable and class-balanced deep learning framework designed for multiclass skin lesion classification. LEVit combines a hybrid Vision Transformer (ViT) with a Convolutional Neural Network (CNN). We evaluated LEVit on two benchmark dermoscopic datasets: HAM10000, which consists of 10,015 images across 7 classes, and ISIC 2019, with 25,331 images spanning 8 classes. Both datasets have notable class imbalances. To address this issue, we applied advanced augmentation techniques to oversample minority classes, ensuring a uniform class distribution and enhancing the model's ability to generalize. LEVit effectively captures local lesion textures and global spatial relationships through its integrated self-attention and convolutional modules. We compared its performance against four state-of-the-art models: NASNet, SqueezeNet, SE-Net, and Xception, across four metrics: F1 Score, Specificity, Matthews Correlation Coefficient (MCC), and Precision-Recall Area Under the Curve (PR AUC). LEVit achieved outstanding results, with a F1 Score of 98.11% and a PR AUC of 98.57% on the ISIC 2019 dataset, and a F1 Score of 96.11% and a PR AUC of 96.62% on HAM10000. For interpretability, we utilized Grad-CAM to generate class-specific heatmaps, which highlight the key areas of lesions that influence the model's predictions. This work demonstrates that balanced training and a hybrid architecture can enhance both classification accuracy and interpretability in skin cancer diagnostics, effectively addressing the limitations of existing models and paving the way for reliable clinical applications.

Keywords

Skin cancer; Vision transformer; Deep learning; Explainable AI (XAI); Medical imaging.

Download Article PDF

https://journalijsra.com/sites/default/files/fulltext_pdf/IJSRA-2025-1166.pdf

Preview Article PDF

How to cite this article

Anamul Haque Sakib, Md Ismail Hossain Siddiqui, Sanjida Akter, Abdullah Al Sakib and Mohammad Rasel Mahmud. LEVit-Skin: A balanced and interpretable transformer-CNN model for multi-class skin cancer diagnosis. International Journal of Science and Research Archive, 2025, 15(01), 1860-1873. Article DOI: https://doi.org/10.30574/ijsra.2025.15.1.1166.

Copyright information

LEVit-Skin: A balanced and interpretable transformer-CNN model for multi-class skin cancer diagnosis

Anamul Haque Sakib 1, Md Ismail Hossain Siddiqui 2, Sanjida Akter 3, Abdullah Al Sakib 4, * and Mohammad Rasel Mahmud 5

Preview Article PDF

Anamul Haque Sakib ¹, Md Ismail Hossain Siddiqui ², Sanjida Akter ³, Abdullah Al Sakib ^{4, *} and Mohammad Rasel Mahmud ⁵