A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

Boutaba, Raouf; Salahuddin, Mohammad A.; Limam, Noura; Ayoubi, Sara; Shahriar, Nashid; Estrada-Solano, Felipe; Caicedo, Oscar M.

doi:10.1186/s13174-018-0087-2

Journal of Internet Services and Applications

Table 9 Summary of RL-based decentralized, partially decentralized, and centralized routing models

From: A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

Ref.	Technique	Application	Dataset	Features^a	Action set	Evaluation
	(selection)	(network)				Settings^a	Improvement^b
AdaR [461]	Partially decentralized LSPI (ε-greedy)	Unicast routing (WSN)	Simulations ·400 sensors ·20 data sources ·1 sink	State: \(\mathcal {N}_{i}\) Reward: function of · node load · residual energy · hop cost to sink · link reliability	Next-hop nodes to destination	·S=#nodes·A=#neighbors	Compared to Q-learning: · Faster convergence (by 40 episodes) · Less sensitive to initial parameters
FROMS [151]	Q-learning (variant of ε-greedy)	Multicast routing (WSN)	Omnet++ Mobility Framework with 50 random topologies ·50 nodes ·5 sources ·45 sinks	State: (\(\mathcal {N}^{k}_{i}\), D_k) Reward: function of hop cost	\(\{a_{1} \cdots a_{m}\} a_{k} = (\mathcal {N}^{k}_{j}, D_k) N^{k}_{j} =\) next hop along the path to sink D_k	·S=#nodes·A=#neighbors	Comparedto directed diffusion: · up to 5× higher delivery rate ·≈20% lower overhead
Q-PR [24]	Variant of Q-learning (ε-greedy)	Localization-aware routing to achieve a trade-off between packet delivery rate, ETX, and network lifetime (WSN)	Simulations ·50 different topologies ·100 nodes	State: \(\mathcal {N}_{i}\) Reward: function of · distance(\(\mathcal {N}_{i}\),\(\mathcal {N}_{j}\)) · distance(\(\mathcal {N}_{j}\),d) · energy at \(\mathcal {N}_{j} \cdot \) ETX \(\cdot \mathcal {N}_{j}\)’s neighbors for any neighbor \(\mathcal {N}_{j}\) and destination	Next-hop nodes to destination	·S=#nodes·A=#neighbors	Delivery rate: ·25% more than GPSR Network lifetime ·3× more than GPSR ·4× more than EFE
Ref.	Technique	Application	Dataset	Features^a	Action set	Evaluation
	(selection)	(network)				Settings^a	Improvement^b
Xia et al. [482]	DRQ-learning (greedy)	Spectrum-aware routing (CRN)	OMNET++ simulations · stationary multi-hop CRN · 10 nodes · 2 PUs	State: \(\mathcal {N}_{i}\) Reward: # available channels between current node and next-hop node	Next-hop nodes to destination	·S=#nodes·A=#neighbors	Compared to Q-routing: ·50% faster at lower activity level Compared to Q-routing and SP-routing: · lower converged end-to-end delay
QELAR [197]	Model-based Q-learning (greedy)	Distributed energy-efficient routing (underwater WSN)	Simulations (ns-2) ·250 sensors in 500³m³ space ·100m transmission range · fixed source/sink ·1m/s maximum speed for intermediate nodes	State: \(\mathcal {N}_{i}\) Reward: function of the residual energy of the node receiving the packet and the energy distribution among its neighbor nodes.	Next-hop nodes to destination ∪ packet withdrawal	·S=#nodes·A=1+#neighbors	Compared to Q-learning: · Faster convergence (40 episodes less) · Less sensitive to initial parameters
Lin et al. [277]	n−step TD (greedy)	Delay-sensitive application routing(multi-hop wireless ad hoc networks)	Simulations 2 users transmitting video sequences to the same destination node ·3∼4-hops wireless network	State: current channel states and queue sizes at the nodes in each hop Reward: goodput at destination	Next-hop nodes to destination	\(\cdot S=n_{q}^{N}\times n_{c}^{H} \cdot A=(N_{h}^2)^{H-1}\times N_{h} N=\# nodes N_h=\# nodes\)at hop h H=#hopsn_q=#queuestates n_c=#channelstates	Complexity ≈2×10⁸ for the 3−hop network With 95% less information exchanges ·∼10% higher PSNR · slightly slower convergence (+1∼2sec)
d-AdaptOR [59]	Q-learning with adaptive learning rate (ε−greedy)	Opportunistic routing (multi-hop wireless ad hoc networks)	Simulations on QualNet with 36 randomly placed wireless nodes in a 150m×150m	State: \(\mathcal {N}_{i}\) Reward: · fixed negative transmission cost is receiver is not the destination · fixed positive reward if receiver is the destination · 0 if packet is withdrawn	Next-hop nodes to destination ∪ packet withdrawal	·S=#nodes·A=1+#neighbors	After convergence (≈300sec) · ETX comparable to a topology-aware routing algorithm ·>30% improvement over greedy-SR, greedy ExOR and SRCR with a single flow · Improvement decreases with # flows
QAR [276]	Centralized SARSA (ε-greedy)	QoS-aware adaptive routing(SDN)	Sprint GIP network trace-driven simulations [418] · 25 switches, 53 links	State: \(\mathcal {N}_{i}\) Reward: function of delay, loss, throughput	Next-hop nodes to destination	·S=#nodes ·A=#neighbors	Compared to Q-learning with QoS-awareness: · Faster convergence time (20 episodes less)

^a\(\mathcal {N}_{i}\): node i; D_k: sink k; S: number of state variables; A: number of possible actions per state; #: number of
^bAverage values. Results vary according to experimental settings.

Back to article page