UTFacultiesEEMCSDisciplines & departmentsSCSEducationAssignmentsFinished AssignmentsFinished Master AssignmentsCaGeLLM: Exploiting security-related vulnerabilities in Agentic AI systems through adversarial prompt categorization & generation

CaGeLLM: Exploiting security-related vulnerabilities in Agentic AI systems through adversarial prompt categorization & generation

MASTER Assignment

CaGeLLM: Exploiting security-related vulnerabilities in Agentic AI systems through adversarial prompt categorization & generation

Type : Master M-CS

Period: October 2025 - March, 2026

Student: Boeve, T. (Twan, Student M-CS)

Date Final project: March 19, 2026

Thesis

Supervisors:

Abstract:

As Large Language Models (LLMs) become increasingly widespread and integrated into modern applications, concerns regarding their security and reliability continue to grow. Recent incidents, ranging from manipulated model behavior to leaked credentials, highlight a lack of awareness and understanding among both users and developers. Although benchmarks exist to assess the robustness of LLMs against adversarial prompts, they primarily focus on resistance to jailbreaking techniques designed to bypass safeguards and elicit unintended content. However, arguably more critical are cybersecurity-related risks, particularly those involving agentic AI systems that can autonomously interact with external components of their environment, such as connected tools, plugins, or databases. Furthermore, existing benchmarks and prompt repositories often lack clarity regarding which prompts are effective, reproducible, and suitable for testing specific vulnerabilities. This research addresses these limitations by introducing a tool that converts high-level prompt descriptions into concrete, effective adversarial prompts. By fine-tuning an LLM on a corpus of over 27,400 prompts, categorized using another dedicated, evaluated LLM, the system can generate tailored adversarial prompts to fill gaps where reliable test inputs are needed. The effectiveness of the prompts generated by the model is evaluated in the representative cybersecurity risk categories of data leakage and tool usage, demonstrating the tool's potential to support more systematic and effective AI red teaming.