Skip to main content

Table 7 Summary of Early ⋆, Sub-flow †-based and Encrypted ‡ flow feature-based traffic classification

From: A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

Ref.

ML Technique

Dataset

Features

Classes

Evaluation

     

Settings

Results

Bernaille et al. [55] ∗

Unsupervised k-Means

Proprietary: univ. network

Packet size and direction of first P packets in a flow

eDonkey, FTP, HTTP, Kazaa, NNTP, POP3, SMTP, SSH, HTTPS, POP3S

P=5, k=50

Accuracy > 80%

TIE [108, 121] ∗

Supervised J48 DT, k-NN, Random Tree, RIPPER, MLP, NB

Proprietary: Univ. Napoli campus network

Payload size stats and inter-packet time stats of first N packets, bidirectional flow duration and size, transport protocol

BitTorrent, SMTP, Skype2Skype, POP, HTTP, SOULSEEK, NBNS, QQ, DNS,SSL RTP, EDONKEY

N=1...10

Overall accuracy = 98.4% with BKS (J48, Random Tree, RIPPER, PL) combiner, N=10

Nguyen et al. [337] †

Supervised NB, C4.5 DT

Proprietary: home network, univ. network, game server

Inter-packet arrival time statistics, inter-packet length variation statistics, IP packet length statistics of N consecutive packets

Enemy Territory (online game), VoIP, Other

N=25

C4.5 DT: Enemy Territory - recall ∗ = 99.3%, prec. ∗ = 97%; VoIP - recall ∗= 95.7%, precision ∗= 99.2% NB: Enemy Territory - recall ∗ = 98.9%, prec. ∗ = 87%, VoIP - recall ∗= 99.6%, precision ∗= 95.4% ∗ median

Erman et al. [137] ⋆

Semi-supervised k-Means

Proprietary: Univ. Calgary

Number of packets, average packet size, total bytes, total header bytes, total payload bytes (caller to callee and vice versa)

P2P, HTTP, CHAT, EMAIL, FTP, STREAMING, OTHER

k = 400, 13 layers, packet milestones (number of packets) in layers are separated exponentially (8, 16, 32, …)

Flow accuracy > 94%, byte accuracy 70-90%

Li et al. [270] ⋆

Supervised C4.5 DT, C4.5 DT with AdaBoost, NBKE

Proprietary

A subset of 12 from 248 features [321] of first N packets

WEB, MAIL, BULK, Attack, P2P, DB, Service, Interactive

N=5

C4.5 DT: Accuracy >99%; Attack is an exception with moderate-high recall

Jin et al.[222] ⋆

Supervised AdaBoost

Proprietary: ISP network, labeled as in [176]

Lowsrcport, highsrcport, duration, mean packet size, mean packet rate, toscount, tcpflags, dstinnet, lowdstport, highdstport, packet, byte, tos, numtosbytes, srcinnet

Business, chat, DNS, FileSharing, FTP, Games, Mail, Multimedia, NetNews, SecurityThreat, VoIP, Web

Number of binary classifiers (k): TCP = 12, UDP = 8

Error rate: TCP = 3%, UDP = 0.4%

Bonfiglio et al. [69] ‡

Supervised NB, Pearson’s χ2 test

Proprietary: univ. network, ISP network

Message size, average inter-packet gap

Skype

NB decision threshold B min =−5, χ2(Thr)=150

NB ∧χ2: UDP – E2E - FP = 0.01%, FN = 29.98% E2O - FP = 0.0%, FN = 9.82% (univ. dataset); E2E - FP = 0.01%, FN = 24.62% E2O - FP = 0.11%, FN = 2.40% (ISP dataset) TCP – negligible FP

Alshammari et al. [17] ‡

Supervised AdaBoost, SVM, NB, RIPPER, C4.5 DT

AMP [457], MAWI [474], DARPA99 [278], proprietary from Univ. Dalhousie

Packet size, packet inter-arrival time, number of packets, number of bytes, flow duration, protocol (forward and backward direction)

SSH, Skype

N/A

C4.5 DT: SSH – DR = 95.9%, FPR = 2.8% (Dalhousie), DR = 97.2%, FPR = 0.8% (AMP), DR = 82.9%, FPR = 0.5% (MAWI) Skype – DR = 98.4%, FPR = 7.8% (Dalhousie)

Shbair et al. [409] ‡

Supervised C4.5 DT, RF

Synthetic trace

Statistical features from encrypted payload and [253] (client to server and vice versa)

Service Provider (number of services): Uni-lorraine.fr (15), Google.com (29), akamihd.net (6), Googlevideo.com (1), Twitter.com (3), Youtube.com (1), Facebook.com (4), Yahoo.com (19), Cloudfront.com (1)

N/A

RF (service provider): precision = 92.6%, recall = 92.8%, F-measure = 92.6% RF (service): accuracy in 95-100% for majority of service providers > 100 connections per HTTPS service

  1. N/A: Not available