Skip to main content

Table 22 Summary of ML for flow feature-based anomaly detection

From: A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

Ref. ML Technique Dataset Features Evaluation
     Settings Results
Kayacik et al. [232] Unsupervised Hierarchical SOM (Offline) KDD Cup [257] 6 TCP features -494,021 training records -311,029 records in test set 1 -4,898,431 records in test set 2 -Platforms: SOM-Toolbox [12] & SOM PAK [250] -3-level SOM w/ # Epochs: 4000 DR Test-set 1: 89% FP Test-set 1: 4.6% DR Test-set 2: 99.7% FP Test-set 2: 1.7%
Kim et al. [242] Supervised SVM (Offline) KDD Cup [257] selected using GA Training set: kddcup.data.gz [257] Testing set: corrected.gz [257] -Detect only DoS attacks -10-fold cross validation -GA ran for 20 generations DR w/ Neural Kernel: 99% DR w/ Radial Kernel:87% DR w/ Inverse Multi-Quadratic Kernel: 77%
Jiang et al. [220] Unsupervised Improved NN (Offline) KDD Cup [256, 257] all 41 features -40,459 training records -429,742 testing records -Cluster Radius Thresh r=[0.2-0.27] DR DoS: 99.10%%99.15 DR Probe: 64.72%80.27% DR U2R: 25.49%60.78% DR R2L 6.34%8.67% DR new attacks: 32.44%42.12% FP: 0.05%1.30%
Zhang et al. [495] Unsupervised Random Forests (Offline) KDD Cup [257] 40 features labeled by service type -4 datasets used with % of attack connections: 1%, 2%, 5%, 10% -Platform used: Weka [288] 1% attacks: FP: 1% DR: 95% 10% attacks: FP: 1% DR: 80%
Ahmed et al. [7] Supervised Kernel Function (Online) From Abilene backbone network number of packets, number of individual IP flows -2 timeseries binned at 5 min intervals -Timeseries dimensions = FxT -F = 121 flows, T = 2016 timesteps T#1 DR: 21/34-30/34 FP:0-19 T#2 DR:28/44-39/44 FP:5-16
Shon et al. [411] Unsupervised Soft-margin SVM and OCSVM (Offline) KDD Cup [257] Data collected from Dalhousie U. selected using GA -SVM Toolkits [88, 396] -100,000 packets for training -1,000-1,500 packet for testing -GA run for 100 generations 3-cross fold validation KDD w/ 9 attack types DR: 74.4% Dalhousie Dataset DR: 99.99% KDD w/ 9 attack types FN:31.3% Dalhousi Dataset FP:0.01%
Giacinto et al. [165] Unsupervised Multiple Classifiers (Offline) KDD Cup [257] 29 features for HTTP 34 features for FTP 16 features for ICMP 31 features for Mail 37 features for Misc 29 features for Private&Other -494,020 training records -311,029 testing records -1.5% of data records is attacks v-SVC DR: 67.31%-94.25% v-SVC FP: 0.91%-9.62%
Hu et al. [198] Supervised Decision stumps with AdaBoost (Offline) KDD Cup [257] all 41 features -494,021 training records -311,029 testing records -Pentium IV with 2.6-GHz CPU and 256-MB RAM -Platform used Matlab 7 DR: 90.04%-90.88% FP: 0.31%-1.79% Mean Training time: 73 sec
Muniyandi et al. [327] Unsupervised K-Means, C4.5 DT (Offline) KDD Cup [257] all 41 features -15,000 training records -2,500 testing records -Intel Pentium Core 2 Duo CPU 2.20GHz, 2.19GHz, 0.99GB of RAM w/ Microsoft Windows XP (SP2) -Platform: Weka 3.5 [288] DR: 99.6% FP: 0.1% Precision: 95.6% Accuracy: 95.8% F-measure: 94.0%
Panda et al. [345] Unsupervised RF, ND, END (Offline) NSL-KDD [438] all 41 features -25,192 training instances -IBM PC of 2.66GHz CPU with 40GB HDD and 512 MB RAM -10-fold cross validation TP: 99.5 FP: 0.1% F-measure: 99.7% Precision: 99.9% Recall 99.9% Time to build model: 18.13 sec
Boero et al. [64] Supervised RBF-SVM (Offline) Normal: from U. of Genoa Malwares: [126, 292, 348, 351] 7 SDN OpenFlow features -RBF Complexity par: 20 -RBF kernel par: 2 Normal-TP: 86% Normal-FP: 1.6% Malware-TP: 98.4% Malware-FP: 13.8%