Skip to main content

Table 23 Summary of ML for payload-based anomaly detection

From: A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

Ref.

ML Technique

Dataset

Features

Evaluation

    

Settings

Results

Zanero et al. [493]

Unsupervised A two-tier SOM-based architecture (Offline)

Normal: KDD Cup [257] Attack: Scans from Nessus [44]

Packet headers and payload

-2,000 training packets -2,000 testing packets -10x10 SOM trained for 10,000 epochs -Platform used: SOM toolbox [12]

Improves DR by 75% over 1-tiered S.O.M

Wang et al. [459]

Unsupervised Centroid model (Offline)

KDD Cup [257] & CUCS

Payload of TCP traffic

-2 weeks training data -3 weeks testing data -Inside network TCP data only -Incremental learning

DR w/ payload of a packet: 58.8% DR w/ first 100 bytes of a packet: 56.7% DR w/ last 100 bytes of a packet: 47.4% DR w/ all payloads of a con: 56.7% DR w/ first 1000 bytes of a Con: 52.6% Training time: 4.6-26.2 sec Testing time: 1.6-16.1 sec

Perdisci et al. [356]

Supervised Ensemble of single-class SVM (Offline)

Normal: KDD Cup [257] Normal: GATECH Attack: CLET [117] Attack: PBA [149] Generic [204]

Payload

-50% of dataset for training -50% of dataset for testing -11 OCSVM trained with 2 v -grams; v=1...10 -5-fold cross validation on KDD cup -7-fold cross validation on GATECH -2 GHz Dual Core AMD Opteron Processor and 8GB RAM

Generic DR w/ FP 10−5: 60% shell-code DR w/ FP 10−5: 90% CLET DR w/ FP 10−5: 90% Detection time KDD Cup: 10.92 ms Detection time GATECH: 17.11 ms

Gornitz et al. [171]

Supervised SVDD (Online)

Normal: from Fraunhofer Inst. Attack: Metasploit

payload

-2,500 training network events -1,250 testing network events -Active Learning -Fraction of Labeled data: 1.5%

DR: 96% FP: 0.0015%