Skip to main content

Table 6 Summary of unsupervised flow feature-based traffic classification

From: A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

Ref.

ML Technique

Dataset

Features

Classes

Evaluation

     

Settings

Results

Liu et al. [283]

Unsupervised k-Means

Proprietary: campus network

Packet-level and flow-level features

WWW, MAIL, P2P, FTP (CONTROL, PASV, DATA), ATTACK, DATABASE, SERVICES, INTERACTIVE, MULTIMEDIA, GAMES

k=80

Average accuracy ≈ 90%, minimum recall = 70%

Zander et al. [492]

Unsupervised AutoClass

NLANR [457]

Packet-level and flow-level features

AOL Messenger, Napster, Half-Life, FTP, Telnet, SMTP, DNS, HTTP

Intra-class homogeneity (H)

Mean accuracy = 86.5%

Erman et al. [136]

Unsupervised AutoClass

Univ. Auckland [457]

Packet-level and flow-level features

HTTP, SMTP, DNS, SOCKS, IRC, FTP(control, data), POP3, LIMEWIRE, FTP

N/A

Accuracy = 91.2%

Erman et al. [135]

Unsupervised DBSCAN

Univ. Auckland [457], proprietary: Univ. Calgary

Packet-level and flow-level features

HTTP, P2P, SMTP, IMAP, POP3, MSSQL, OTHER

eps = 0.03, minPts = 3, number of clusters = 190

Overall accuracy = 75.6%, average precision > 95% (7/9 classes)

Erman et al. [138]

Unsupervised k-Means

Proprietary: univ. network

Packet-level and flow-level features from unidirectional flows

Web, EMAIL, DB, P2P, OTHER, CHAT, FTP, STREAMING

k= 400

Server-to-client: Avg. flow accuracy = 95%, Avg. byte accuracy = 79%; Web: precision = 97%, recall = 97%; P2P: precision = 82%, recall = 77%

  1. N/A: Not available