Department of Electronics Information Engineering, School of Electronics Information Engineering, China West Normal University, Nanchong, Sichuan, China.
International Journal of Science and Research Archive, 2025, 16(03), 846-856
Article DOI: 10.30574/ijsra.2025.16.3.2588
Received on 03 August 2025; revised on 14 September 2025; accepted on 18 September 2025
Large Language Models (LLMs) have surged in popularity due to their impressive ability to generate human-like text. Despite the widespread use of large language models, there is a growing concern about their disregard for human ethics and their potential to produce harmful content. While many LLMs are aligned with safeguards, there is a category of prompt injection attacks known as jailbreaks, specifically designed to bypass these protections and generate malicious output. Despite extensive research on novel jailbreak attacks and potential defenses, there is limited exploration into accurately evaluating the success of these attacks. In this paper, we introduce seven evaluation methods used in re- search to determine the effectiveness of jailbreak attempts. We conduct a comprehensive analysis of these seven methods with a particular focus on their accuracy. Our research aims to advance the discussion on improving the safety and alignment of LLMs with human values and to contribute to the development of more robust and secure LLM-based applications. Code is available at github.com/cenacle e18-4yp-An-Empirical-Study- On-Prompt-Injection-Attacks-And-Defenses. Due to the weaknesses of these basic evaluation methods, there is a risk of misrepresenting the actual effectiveness of jailbreak attacks and the security vulnerabilities of the models. Therefore, this research provides a comprehensive analysis of the often-overlooked limitations of each evaluation method and their accuracy. Our goal is to lay the foundation for the development of more standardized, reliable, and measurable evaluation metrics to determine the success of an attack. This will lay the foundation for future security research, while enabling the creation of more secure and user-friendly LLM applications.
Large Language Models; Jail Breaking; Evaluation Methods; Prompt Injection Attacks
Preview Article PDF
Ali MD Sojib, Hossen MD Nafew and Hasan MD Ridoy. A comprehensive analysis of jailbreak vulnerabilities in large language models. International Journal of Science and Research Archive, 2025, 16(03), 846-856. Article DOI: https://doi.org/10.30574/ijsra.2025.16.3.2588.
Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0







