UTFacultiesEEMCSDisciplines & departmentsSCSEducationAssignmentsFinished AssignmentsFinished Master AssignmentsThe TERMINATOR is Back With More Context: Exploring the Influence of RAG on Automated Security Mitigations

The TERMINATOR is Back With More Context: Exploring the Influence of RAG on Automated Security Mitigations

MASTER Assignment

The TERMINATOR is Back With More Context: Exploring the Influence of RAG on Automated Security Mitigations

Type : Master M-CS

Period: August 2025 - January, 2026

Student: Milojković, S. (Stefan, Student M-CS)

Date Final project: January 27, 2026

Thesis

Supervisors:

Z.L. Kucsván, MSc

Abstract:

Key challenges in cyber-incident response include the need for system-specific knowledge, the manual effort required to interpret incident response playbooks, defined as predefined step-by-step procedures for handling security incidents, and to adapt these actions to a given system, as well as the requirement for rapid response to cybersecurity incidents. To address these challenges, we build upon an already existing TERMINATOR system, which leverages Large Language Models (LLMs) to translate the cyber incident response playbook steps to command-line commands.  We aim to improve the ability of the system to select utilities and generate commands to solve the incident response playbook steps. We use Retrieval Augmented Generation (RAG), a technique that combines a generic large language model with the retrieval of relevant external knowledge during response generation, to supplement a generic LLM model to perform cybersecurity incident response. We test this setup on simulations of compromised systems.

Our results show that the added RAG components do not consistently outperform the original system for common utilities (well known command-line programs that have a
command-line interface) due to the strong pre-trained knowledge of the LLM and the negative effects of cognitive overload, defined as a decline in performance caused by providing the model with more information than is beneficial for the task. However, RAG significantly improves performance in scenarios involving never before seen utilities, where no prior model knowledge is available. Our findings indicate that current limitations mainly come from the suboptimal performance of the RAG components of the system, rather than the command generation capabilities of the LLM itself, and highlight the need for a more error-prone, adaptive RAG-TERMINATOR system, and context integration to enable reliable LLM-powered incident response automation.