February 15, 2022: Cyber Security Master Thesis Assignment: SecBERT

MAster assignment

Cyber security: Secbert

TYPE : Master CS (Cybersecurity)

Period: Start date: March 2022

Student: Liberato, M. (Matteo, Student M-CS)

If you are interested please contact:

Description:

SecBERT
Recently, advances in neural networks and Natural Language Processing (NLP) techniques such as BERT and GPT-3 opened tremendous opportunities for automatic text processing such as text classification or knowledge extraction.  While existing techniques work great for generic texts, specific subdomains such as "cyber security" often pose domain-specific challenges. Domains have specific jargon that is not present in general vocabulary, or require domain-specific knowledge for understanding text and subsequently extracting relevant information.
    This project aims to develop a new neural network inspired by BERT (SecBERT), that is trained specifically for analysing cyber threat reports. We start by retraining SecBERT for generic NLP tasks such as language modelling (predicting masked words in a sentence) and next sentence prediction (predicting if a chosen next sentence is probable or not given the first sentence). After this initial phase, we will explore how we can use SecBERT to improve state-of-the-art analysis tools for cyber threat reports. Here we will look into (amongst others) improved word-embeddings and knowledge extraction techniques. Here, you are also free to explore your own interests in applying SecBERT to the analysis of cyber threat reports.

Difficulty:
    medium to as difficult as you like :)

Requirements:

Related papers:

https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (Explained in code here: http://nlp.seas.harvard.edu/2018/04/03/attention.html)

Datasets: