Hogzilla DNS K-means clustering

DNS flows appointed by this method present behaviour similar to flows tagged by Snort as malicious (priority 1). If you have a host associated with such an alert, you should investigate. Probably, the host is infected by a Malware.

In the event’s note, you can find the domain name involved with the suspicious flow. Search in Google , VirusTotal , Malwr or any other Malware database to certify if the domain name is somehow associated with a malicious code.

Relevant applications to remove Malware

Technical Details

Below some steps of Hogzilla IDS DNS k-means clustering algorithm are described.

  • Select from HBase the features listed in table below for all DNS flows containing at least two packets
  • Normalize the data and cluster the points in 9 clusters using k-means
  • Stratify the points by (cluster,flow classification from nDPI)
  • Generate alerts for the strata with the proportions of Snort events larger than a threshold
Used features
flow:avg_packet_size
flow:packets_without_payload
flow:avg_inter_time
flow:flow_duration
flow:max_packet_size
flow:bytes
flow:packets
flow:min_packet_size
flow:packet_size-0
flow:inter_time-0
flow:dns_num_queries
flow:dns_num_answers
flow:dns_ret_code
flow:dns_bad_packet
flow:dns_query_type
flow:dns_rsp_type

Tests in lab

  • Coming soon

Comments

  • The number 9 was defined heuristically, based on some results in laboratory

References

  • An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Available for free at http://www-bcf.usc.edu/~gareth/ISL/ , but you should by it!