Skip to main content

Table 7 Summary of Early , Sub-flow -based and Encrypted flow feature-based traffic classification

From: A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

Ref. ML Technique Dataset Features Classes Evaluation
      Settings Results
Bernaille et al. [55] Unsupervised k-Means Proprietary: univ. network Packet size and direction of first P packets in a flow eDonkey, FTP, HTTP, Kazaa, NNTP, POP3, SMTP, SSH, HTTPS, POP3S P=5, k=50 Accuracy > 80%
TIE [108, 121] Supervised J48 DT, k-NN, Random Tree, RIPPER, MLP, NB Proprietary: Univ. Napoli campus network Payload size stats and inter-packet time stats of first N packets, bidirectional flow duration and size, transport protocol BitTorrent, SMTP, Skype2Skype, POP, HTTP, SOULSEEK, NBNS, QQ, DNS,SSL RTP, EDONKEY N=1...10 Overall accuracy = 98.4% with BKS (J48, Random Tree, RIPPER, PL) combiner, N=10
Nguyen et al. [337] Supervised NB, C4.5 DT Proprietary: home network, univ. network, game server Inter-packet arrival time statistics, inter-packet length variation statistics, IP packet length statistics of N consecutive packets Enemy Territory (online game), VoIP, Other N=25 C4.5 DT: Enemy Territory - recall = 99.3%, prec. = 97%; VoIP - recall = 95.7%, precision = 99.2% NB: Enemy Territory - recall = 98.9%, prec. = 87%, VoIP - recall = 99.6%, precision = 95.4% median
Erman et al. [137] Semi-supervised k-Means Proprietary: Univ. Calgary Number of packets, average packet size, total bytes, total header bytes, total payload bytes (caller to callee and vice versa) P2P, HTTP, CHAT, EMAIL, FTP, STREAMING, OTHER k = 400, 13 layers, packet milestones (number of packets) in layers are separated exponentially (8, 16, 32, …) Flow accuracy > 94%, byte accuracy 70-90%
Li et al. [270] Supervised C4.5 DT, C4.5 DT with AdaBoost, NBKE Proprietary A subset of 12 from 248 features [321] of first N packets WEB, MAIL, BULK, Attack, P2P, DB, Service, Interactive N=5 C4.5 DT: Accuracy >99%; Attack is an exception with moderate-high recall
Jin et al.[222] Supervised AdaBoost Proprietary: ISP network, labeled as in [176] Lowsrcport, highsrcport, duration, mean packet size, mean packet rate, toscount, tcpflags, dstinnet, lowdstport, highdstport, packet, byte, tos, numtosbytes, srcinnet Business, chat, DNS, FileSharing, FTP, Games, Mail, Multimedia, NetNews, SecurityThreat, VoIP, Web Number of binary classifiers (k): TCP = 12, UDP = 8 Error rate: TCP = 3%, UDP = 0.4%
Bonfiglio et al. [69] Supervised NB, Pearson’s χ2 test Proprietary: univ. network, ISP network Message size, average inter-packet gap Skype NB decision threshold B min =−5, χ2(Thr)=150 NB χ2: UDP – E2E - FP = 0.01%, FN = 29.98% E2O - FP = 0.0%, FN = 9.82% (univ. dataset); E2E - FP = 0.01%, FN = 24.62% E2O - FP = 0.11%, FN = 2.40% (ISP dataset) TCP – negligible FP
Alshammari et al. [17] Supervised AdaBoost, SVM, NB, RIPPER, C4.5 DT AMP [457], MAWI [474], DARPA99 [278], proprietary from Univ. Dalhousie Packet size, packet inter-arrival time, number of packets, number of bytes, flow duration, protocol (forward and backward direction) SSH, Skype N/A C4.5 DT: SSH – DR = 95.9%, FPR = 2.8% (Dalhousie), DR = 97.2%, FPR = 0.8% (AMP), DR = 82.9%, FPR = 0.5% (MAWI) Skype – DR = 98.4%, FPR = 7.8% (Dalhousie)
Shbair et al. [409] Supervised C4.5 DT, RF Synthetic trace Statistical features from encrypted payload and [253] (client to server and vice versa) Service Provider (number of services): Uni-lorraine.fr (15), Google.com (29), akamihd.net (6), Googlevideo.com (1), Twitter.com (3), Youtube.com (1), Facebook.com (4), Yahoo.com (19), Cloudfront.com (1) N/A RF (service provider): precision = 92.6%, recall = 92.8%, F-measure = 92.6% RF (service): accuracy in 95-100% for majority of service providers > 100 connections per HTTPS service
  1. N/A: Not available