The Hogzilla Dataset

The Hogzilla Dataset is compounded by network flows extracted from the CTU-13 Botnet [1] and the ISCX 2012 IDS [2] datasets. Each flow has 192 behavioral features and is labeled by the Snort IDS, the nDPI library, and the original dataset. The resulted dataset has behavioral information of Botnets, found in the CTU-13 dataset, and normal traffic found in the ISCX 2012 IDS dataset. This dataset was used in [3,4].

for more information click here (link to the article that is in press [5]).

CSV Files

Basic import instructions

Importing in Apache Spark (SCALA)

val data = sparkSession.read.format("csv")
                       .option("header", "true")
                       .load("csvfile.csv")
data.show(2)

Importing in R

data = read.csv("csvfile.csv")
data

References

[1] S. Garcia, M. Grill, H. Stiborek, A. Zunino, An empirical comparison of botnet detection methods, Computers and Security Journal 45 (2014) 100–123.

[2] A. Shiravi, H. Shiravi, M. Tavallaee, A. A. Ghorbani, Toward developing a systematic approach to generate benchmark datasets for intrusion detection, Computers and Security 31 (3) (2012) 357–374.

[3] P. A. A. Resende, A. C. Drummond, HTTP and contact-based features for Botnet detection using Random Forest on Apache Spark, Computers and Security, in press.

[4] P. A. A. Resende, A. C. Drummond, An active labeling approach for behavioral-based Intrusion Detection Systems, Computers and Security, in press.

[5] P. A. A. Resende, A. C. Drummond, Network flows of normal and Botnet traffic extracted from the CTU-13 and ISCX 2012 datasets and labeled by Snort IDS and lib nDPI, Data in brief, in press.