The Hogzilla Dataset is compounded by network flows extracted from the CTU-13 Botnet [1] and the ISCX 2012 IDS [2] datasets. Each flow has 192 behavioral features and is labeled by the Snort IDS, the nDPI library, and the original dataset. The resulted dataset has behavioral information of Botnets, found in the CTU-13 dataset, and normal traffic found in the ISCX 2012 IDS dataset. This dataset was used in [3,4].
for more information click here (link to the article that is in press [5]).
CSV Files
- Hz-CTU13_1.csv.zip (12,955 flows)
- Hz-CTU13_2.csv.zip (15,695 flows)
- Hz-CTU13_3.csv.zip (32,473 flows)
- Hz-CTU13_4.csv.zip (30 flows)
- Hz-CTU13_5.csv.zip (847 flows)
- Hz-CTU13_6.csv.zip (1,492 flows)
- Hz-CTU13_7.csv.zip (38 flows)
- Hz-CTU13_8.csv.zip (4,344 flows)
- Hz-CTU13_9.csv.zip (88,983 flows)
- Hz-CTU13_10.csv.zip (1,125 flows)
- Hz-CTU13_11.csv.zip (64 flows)
- Hz-CTU13_12.csv.zip (6,112 flows)
- Hz-CTU13_13.csv.zip (35,587 flows)
- Hz-ISCX2012_testbed-11jun.csv.zip (325,757 flows)
- Hz-ISCX2012_testbed-16jun.csv.zip (464,988 flows)
Basic import instructions
Importing in Apache Spark (SCALA)
val data = sparkSession.read.format("csv")
.option("header", "true")
.load("csvfile.csv")
data.show(2)
Importing in R
data = read.csv("csvfile.csv")
data
References
[1] S. Garcia, M. Grill, H. Stiborek, A. Zunino, An empirical comparison of botnet detection methods, Computers and Security Journal 45 (2014) 100–123.
[2] A. Shiravi, H. Shiravi, M. Tavallaee, A. A. Ghorbani, Toward developing a systematic approach to generate benchmark datasets for intrusion detection, Computers and Security 31 (3) (2012) 357–374.
[3] P. A. A. Resende, A. C. Drummond, HTTP and contact-based features for Botnet detection using Random Forest on Apache Spark, Computers and Security, in press.
[4] P. A. A. Resende, A. C. Drummond, An active labeling approach for behavioral-based Intrusion Detection Systems, Computers and Security, in press.
[5] P. A. A. Resende, A. C. Drummond, Network flows of normal and Botnet traffic extracted from the CTU-13 and ISCX 2012 datasets and labeled by Snort IDS and lib nDPI, Data in brief, in press.