Skip to main content

Table 22 Summary of ML for flow feature-based anomaly detection

From: A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

Ref.

ML Technique

Dataset

Features

Evaluation

    

Settings

Results

Kayacik et al. [232]

Unsupervised Hierarchical SOM (Offline)

KDD Cup [257]

6 TCP features

-494,021 training records -311,029 records in test set 1 -4,898,431 records in test set 2 -Platforms: SOM-Toolbox [12] & SOM PAK [250] -3-level SOM w/ # Epochs: 4000

DR Test-set 1: 89% FP Test-set 1: 4.6% DR Test-set 2: 99.7% FP Test-set 2: 1.7%

Kim et al. [242]

Supervised SVM (Offline)

KDD Cup [257]

selected using GA

Training set: kddcup.data.gz [257] Testing set: corrected.gz [257] -Detect only DoS attacks -10-fold cross validation -GA ran for 20 generations

DR w/ Neural Kernel: 99% DR w/ Radial Kernel:87% DR w/ Inverse Multi-Quadratic Kernel: 77%

Jiang et al. [220]

Unsupervised Improved NN (Offline)

KDD Cup [256, 257]

all 41 features

-40,459 training records -429,742 testing records -Cluster Radius Thresh r=[0.2-0.27]

DR DoS: 99.10%%99.15 DR Probe: 64.72%80.27% DR U2R: 25.49%60.78% DR R2L 6.34%8.67% DR new attacks: 32.44%42.12% FP: 0.05%1.30%

Zhang et al. [495]

Unsupervised Random Forests (Offline)

KDD Cup [257]

40 features labeled by service type

-4 datasets used with % of attack connections: 1%, 2%, 5%, 10% -Platform used: Weka [288]

1% attacks: FP: 1% DR: 95% 10% attacks: FP: 1% DR: 80%

Ahmed et al. [7]

Supervised Kernel Function (Online)

From Abilene backbone network

number of packets, number of individual IP flows

-2 timeseries binned at 5 min intervals -Timeseries dimensions = FxT -F = 121 flows, T = 2016 timesteps

T#1 DR: 21/34-30/34 FP:0-19 T#2 DR:28/44-39/44 FP:5-16

Shon et al. [411]

Unsupervised Soft-margin SVM and OCSVM (Offline)

KDD Cup [257] Data collected from Dalhousie U.

selected using GA

-SVM Toolkits [88, 396] -100,000 packets for training -1,000-1,500 packet for testing -GA run for 100 generations 3-cross fold validation

KDD w/ 9 attack types DR: 74.4% Dalhousie Dataset DR: 99.99% KDD w/ 9 attack types FN:31.3% Dalhousi Dataset FP:0.01%

Giacinto et al. [165]

Unsupervised Multiple Classifiers (Offline)

KDD Cup [257]

29 features for HTTP 34 features for FTP 16 features for ICMP 31 features for Mail 37 features for Misc 29 features for Private&Other

-494,020 training records -311,029 testing records -1.5% of data records is attacks

v-SVC DR: 67.31%-94.25% v-SVC FP: 0.91%-9.62%

Hu et al. [198]

Supervised Decision stumps with AdaBoost (Offline)

KDD Cup [257]

all 41 features

-494,021 training records -311,029 testing records -Pentium IV with 2.6-GHz CPU and 256-MB RAM -Platform used Matlab 7

DR: 90.04%-90.88% FP: 0.31%-1.79% Mean Training time: 73 sec

Muniyandi et al. [327]

Unsupervised K-Means, C4.5 DT (Offline)

KDD Cup [257]

all 41 features

-15,000 training records -2,500 testing records -Intel Pentium Core 2 Duo CPU 2.20GHz, 2.19GHz, 0.99GB of RAM w/ Microsoft Windows XP (SP2) -Platform: Weka 3.5 [288]

DR: 99.6% FP: 0.1% Precision: 95.6% Accuracy: 95.8% F-measure: 94.0%

Panda et al. [345]

Unsupervised RF, ND, END (Offline)

NSL-KDD [438]

all 41 features

-25,192 training instances -IBM PC of 2.66GHz CPU with 40GB HDD and 512 MB RAM -10-fold cross validation

TP: 99.5 FP: 0.1% F-measure: 99.7% Precision: 99.9% Recall 99.9% Time to build model: 18.13 sec

Boero et al. [64]

Supervised RBF-SVM (Offline)

Normal: from U. of Genoa Malwares: [126, 292, 348, 351]

7 SDN OpenFlow features

-RBF Complexity par: 20 -RBF kernel par: 2

Normal-TP: 86% Normal-FP: 1.6% Malware-TP: 98.4% Malware-FP: 13.8%