A comprehensive analysis of jailbreak vulnerabilities in large language models

Ali MD Sojib; Hossen MD Nafew; Hasan MD Ridoy

doi:10.30574/ijsra.2025.16.3.2588

Ali MD Sojib ^* , Hossen MD Nafew and Hasan MD Ridoy

Department of Electronics Information Engineering, School of Electronics Information Engineering, China West Normal University, Nanchong, Sichuan, China.

Review Article

International Journal of Science and Research Archive, 2025, 16(03), 846-856

Article DOI: 10.30574/ijsra.2025.16.3.2588

DOI url: https://doi.org/10.30574/ijsra.2025.16.3.2588

Publication history

Received on 03 August 2025; revised on 14 September 2025; accepted on 18 September 2025

Abstract

Large Language Models (LLMs) have surged in popularity due to their impressive ability to generate human-like text. Despite the widespread use of large language models, there is a growing concern about their disregard for human ethics and their potential to produce harmful content. While many LLMs are aligned with safeguards, there is a category of prompt injection attacks known as jailbreaks, specifically designed to bypass these protections and generate malicious output. Despite extensive research on novel jailbreak attacks and potential defenses, there is limited exploration into accurately evaluating the success of these attacks. In this paper, we introduce seven evaluation methods used in re- search to determine the effectiveness of jailbreak attempts. We conduct a comprehensive analysis of these seven methods with a particular focus on their accuracy. Our research aims to advance the discussion on improving the safety and alignment of LLMs with human values and to contribute to the development of more robust and secure LLM-based applications. Code is available at github.com/cenacle e18-4yp-An-Empirical-Study- On-Prompt-Injection-Attacks-And-Defenses. Due to the weaknesses of these basic evaluation methods, there is a risk of misrepresenting the actual effectiveness of jailbreak attacks and the security vulnerabilities of the models. Therefore, this research provides a comprehensive analysis of the often-overlooked limitations of each evaluation method and their accuracy. Our goal is to lay the foundation for the development of more standardized, reliable, and measurable evaluation metrics to determine the success of an attack. This will lay the foundation for future security research, while enabling the creation of more secure and user-friendly LLM applications.

Keywords

Large Language Models; Jail Breaking; Evaluation Methods; Prompt Injection Attacks

Download Article PDF

https://journalijsra.com/sites/default/files/fulltext_pdf/IJSRA-2025-2588.pdf

Preview Article PDF

How to cite this article

Ali MD Sojib, Hossen MD Nafew and Hasan MD Ridoy. A comprehensive analysis of jailbreak vulnerabilities in large language models. International Journal of Science and Research Archive, 2025, 16(03), 846-856. Article DOI: https://doi.org/10.30574/ijsra.2025.16.3.2588.

Copyright information

A comprehensive analysis of jailbreak vulnerabilities in large language models

Ali MD Sojib * , Hossen MD Nafew and Hasan MD Ridoy

Preview Article PDF

Ali MD Sojib ^* , Hossen MD Nafew and Hasan MD Ridoy