Anna Sperotto PhD defense

On October 14, Anna Sperotto defended her thesis: “Flow-based intrusion detection”. The defense took place in the Waaier 4, at 14:45.

Abstract: The spread of 1-10Gbps technology has in recent years paved the way to a flourishing landscape of new, high-bandwidth Internet services. As users, we depend on the Internet in our daily life for simple tasks such as checking e-mails, but also for managing private and financial information. However, entrusting such information to the Internet also means that the network has become an alluring place for hackers. To this threat, the research community has answered with an increased interest in intrusion detection. With the number of attacks almost exponentially increasing, and the attackers' motivations moving from ideological to economical, the researchers' attention is focused on developing new techniques to timely detect intruders and prevent damage. Our studies in the field of intrusion detection, however, made us realize that additional research is needed, in particular: the creation of shared data sets to validate Intrusion Detection Systems (IDSs) and the development of automatic procedures to tune the parameters of IDSs.

The contribution of this thesis is that it develops a structured approach to intrusion detection that focuses on (i) shared ground-truth data sets and (ii) automatic parameter tuning. We develop our approach by focusing on network flows. Flows offer an aggregated view of network traffic, by reporting on the amount of packets and bytes exchanged over the network. Therefore, flows drastically reduce the amount of data to be analyzed. In this thesis, we aim at detecting anomalies in flow-based time series, describing how the number of flows, packets and bytes changes over time.

Ground truth data sets are fundamental in the development phase, for validation purposes and, if publicly available, for comparison between different IDSs. We attack the problem of ground truth generation in two complementary manners.

First, we obtain ground truth information for flow-based intrusion detection by manually creating it. We do so by means of a honeypot-based data collection and monitoring setup, specifically tuned to (i) offer an attracting platform for attackers, and (ii) include enhanced logging capabilities to support the labeling of the collected data. The outcome of our research has been a publicly released flow-based labeled data set. To the best of our knowledge, no such data set already exists.

Second, we generate ground truth information in an automatic manner. We do this by generating artificial flow, packet and byte time series for benign and attack traffic. In this thesis, we rely upon Hidden Markov Models, which allow for probabilistic and compact representations of flow-based time series and can be used for generation purposes.

Finally, we approach the problem of automatic tuning of IDSs. The performance of an IDS is governed by the trade-off between detecting all anomalies (at the expense of raising alarms too often), and missing anomalies (but not issuing many false alarms). We developed an optimization procedure that aims to mathematically treat such trade-off in a systematic manner, by automatically tuning the system parameters.