Hogzilla HTTP K-means clustering

HTTP flows appointed by this method present behaviour similar to flows tagged by Snort as malicious (priority 1). If you have a host associated with such an alert, you should investigate. Probably, the host is infected by a Malware.

In the event’s note, you can find the domain name involved with the suspicious flow. Search in Google , VirusTotal , Malwr or any other Malware database to certify if the domain name is somehow associated with a malicious code.

Relevant applications to remove Malware

Technical Details

Below some steps of Hogzilla IDS HTTP k-means clustering algorithm are described.

  • Select from HBase the features listed in table below for all HTTP flows containing at least two packets
  • Normalize the data and cluster the points in 32 clusters using k-means
  • Stratify the points by (cluster,flow classification from nDPI)
  • Generate alerts for the strata with the proportions of Snort events larger than a threshold
Used features
flow:avg_packet_size
flow:packets_without_payload
flow:avg_inter_time
flow:flow_duration
flow:max_packet_size
flow:bytes
flow:packets
flow:min_packet_size
flow:packet_size-0
flow:inter_time-0
flow:packet_size-1
flow:inter_time-1
flow:packet_size-2
flow:inter_time-2
flow:packet_size-3
flow:inter_time-3
flow:packet_size-4
flow:inter_time-4
flow:http_method

Tests in lab

  • Coming soon

Comments

  • The number 32 was defined heuristically, based on some results in laboratory

References

  • An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Available for free at http://www-bcf.usc.edu/~gareth/ISL/ , but you should by it!