19th February 2009
Flow-based intrusion detection represents a successful security mechanism in cases in which payload inspection is almost unfeasible, as for example in high speed network. Despite the richness in contribution, benchmarking of flow-based IDS is still an open issue. We propose the first publicly available, labeled data set for flow-based intrusion detection. The data set aims to be realistic, i.e. representative of real traffic and complete from a labeling perspective. Our experience shows that a measurement setup providing additional labeling information is crucial. We therefore based our setup on a honeypot running widely deployed services, such as ssh, ftp and http. The honeypot was directly connected to the Internet, ensuring to be attack-exposed. The final data set consists of 14.2M flows and it more than 98% of them has been labeled.