RRG: redundancy reduced gossip protocol for real-time N-to-N dynamic group communication
© Luk et al.; licensee Springer. 2013
Received: 18 October 2012
Accepted: 23 April 2013
Published: 17 May 2013
Real-time group communication is an indispensable part of many interactive multimedia applications over the internet. In scenarios that involve large group sizes, sporadic sources, high user churns, and random network failures, gossip-based protocols can potentially provide advantages over structure-based group communication algorithms in ease of deployment, scalability, and resiliency against churns and failures. In this paper, we propose a novel protocol called Redundancy Reduced Gossip for real-time N-to-N group communication. We show that our proposed protocol can achieve a considerably lower traffic load than conventional push-based gossip protocols and conventional push-pull gossip protocols for the same probability of successful delivery, with higher performance gains in networks with smaller delays. We derive a mathematical model for estimating the frame non-delivery probability and the traffic load from overhead, and demonstrate the general correctness of the model by simulation. We implement a functioning prototype conferencing system using the proposed protocol, completed with functions including NTP synchronization, dynamic group size estimation, redundancy suppression, and other features needed for proper operation. We perform experiments over the campus network and PlanetLab, and the prototype system demonstrates the ability of our protocol to maintain robust performance in real-world network environments.
The delay requirement of real-time communication is stringent - generally assumed to be comparable to what is required for conversational voice. The one-way delay should be kept below 400 msec . Protocols for streaming are typically not designed with this stringent delay requirement in mind.
Communication among the group members is N-to-N in that a random number of active sources may generate voice, video, and control data information to be distributed to all other members at the same time. Protocols that consider individual sources in isolation may not be optimal in such a scenario.
The peers are sporadic meaning that each peer may switch between active and idle state rapidly.
There is a high degree of user churn meaning that users may join and leave the group dynamically at will.
Structure-based approaches require participating nodes to form a certain deterministic structure, often a tree constructed as a solution to a delay-constrained minimum Steiner tree problem by heuristics [25, 26, 29–34, 36]. In such tree-based systems, bandwidth usage is very efficient as no duplicated messages are sent. The total bandwidth consumption can be further reduced by incorporating the mixing of audio streams within the structure [22–24, 35], or by combining IP multicast in LAN . In N-to-N group communication, multiple peers may generate information concurrently. Therefore, the authors in the papers [29, 33, 37] have argued that multiple source-specific multicast trees should be constructed instead of just one shared multicast tree. Other optimizations have also been proposed, such as resources sharing among trees of different sessions , and the 2-hop delayed-bounded tree [40, 41]. Examples of structure-based approaches that are not tree-based include chained-based overlay using layered coding  and snow-ball chunk .
Previous studies have shown that if the user churn is low so that the structure is stable, and if the network loss-rate is also low, then structure-based systems can perform very well. In the presence of user churn and network degradations, however, structure-based systems may become unreliable because the overhead for tree maintenance and message recovery may increase with a snowball effect, as pointed out in the papers [45–47]. Based on experience learned from the evolution of live streaming protocols, Zhang et al.  have also concluded that structure-based multicast protocols are impractical on the Internet because of user churn and network degradation dynamics. Note that churn-coping strategies for structured approaches were discussed in the papers [21, 37]. But due to the fundamental limitation of structured approaches, the tolerated churn rate is very low. For example, a churn rate of 4/minute with a group size of 50–200 requires 2 seconds of recovery time . Chu et al. have also acknowledged the poor transient performance in larger group sizes in their work . A scheme using multiple distribution trees with Multiple Description Coding (MDC) is proposed as a churn coping measure in the paper , but this scheme can only be used for traffic types where MDC is applicable, i.e., video. MDC is not applicable to gaming control data and it is questionable whether it is applicable for voice.
Gossip-based protocols have been considered by many researchers to be reliable in a probabilistic sense as their randomized nature helps to “route around” peer churn and network degradation . Gossip-based protocols have first been examined for information dissemination in what is known as randomized rumor spreading  or epidemic algorithm . In a gossip-based protocol, each cycle of information spreading consists of multiple phases of gossip and in each phase, peers operate in parallel and each peer communicates with one or more randomly selected partners (Figure 1c). In synchronous gossip , a phase is launched simultaneously by all peers, and one phase is completed before the start of the next phase. Synchronous gossip assumes that the period of a phase is larger than the one-way delay between any pair of nodes, a condition that is unrealistic for real time communication. In asynchronous gossip , peers do not operate in synchronous phases but gossip asynchronously in response to messages received. In either synchronous or asynchronous gossip, the number of phases or the number of times a message is relayed in a cycle must be limited to a very small number independent of the population size for real-time communications because of the stringent delay requirement. This real-time requirement leads also to the use of push  rather than pull to reduce the amount of time needed for each phase. For example, Verma et al.  have proposed the use of an adaptive fanout to control an asynchronous infection pattern over a limited number of phases in a push manner, and Georgiou et al.  have derived the probability for successful rumor spreading in relation to the number of gossip targets under a given number of phases. To the best of our knowledge, all of the existing asynchronous gossip schemes for real time communication use one push or push-pull operation in one phase, and each gossip phase is independent of other phases. These push protocols usually produce a large number of duplicated messages and thus have a low bandwidth utilization efficiency.
In this paper, we propose a new asynchronous way of gossiping with limited delay. In our scheme, a peer establishes connectivity with multiple peers and uses a limited number of push-pull operations in each information spreading cycle. This repeated push-pulls between two peers during each cycle (details in Sec. III-A) results in a much smaller number of duplicated messages compared to conventional push-based gossip protocols and conventional push-pull gossip protocols for real-time applications [43, 44]. Hence, we name our protocol Redundancy Reduced Gossip (RRG).
It is worth noting that some gossip protocols proposed for ad hoc networks also use fixed connectivity [55–57]. However, their connectivity is confined to near-neighbor links. In our scheme, the connectivity is randomly established among all participants. Melamed et al. in  also proposed the use of a gossip push to fixed downstream neighbors, but none of the related works [47, 55–57] uses multiple push-pulls in each cycle.
A novel protocol, called Redundancy Reduced Gossip, for real-time N-to-N dynamic group communication is proposed. The protocol allows the distribution of information from an arbitrary number of random sources within a group, with low latency, minimal membership maintenance, and without assumption on the underlying network condition. The proposed protocol can achieve a given successful delivery probability with a considerably lower traffic load than conventional push gossip protocols and conventional push-pull gossip protocols for real time.
A mathematical model is developed and presented for analyzing the frame non-delivery probability and overhead of RRG. The model provides useful insights into the design of our protocol. It can also be used to evaluate the performance of other related protocols.
A Linux-based prototype system running the protocol is implemented and tested. Some details and challenges of the implementation are described. Experiment results of the system operating over a LAN as well as over the PlanetLab  are collected and analyzed. The prototype system demonstrates the ability of our protocol to maintain robust performance in real-world network environments.
The rest of the paper is organized as follows. Section II presents an overview of related works. Section III describes the RRG protocol. Section IV presents the performance evaluation results from a mathematical model and from the simulator. Section V presents the prototype design, challenges and network experiment results. Section VI concludes the paper.
II. Related works
Using gossip for real-time task execution systems has been proposed in the papers [59, 60], but the research focuses and methods in these works are different from ours. Huang et al.  have proposed a gossip-based super-node architecture for query and routing in 1-to-1 information dissemination. Han et al.  have adopted the adaptive fanout gossip model proposed by  for peer discovery and applied the model to a real-time distributable thread scheduling problem.
Push-pull gossip has been studied in the papers [51, 61, 62]. For example, in the paper , one push-pull is used in one phase for the computation of aggregate information. Karp et al.  and Khambatti et al.  have proposed the use of push- followed by pull-gossip in two separate stages. They try to combine the expediency of push-gossip with the lower redundancy of pull-gossip. However, these two phase solutions [51, 62] are not applicable to real-time communications.
Three-phase pull or lazy push gossip  is studied in the streaming papers [48, 64–66]. It is important to note that streaming applications have a less stringent delay requirement (buffer built-out delays of 10 – 30 seconds are quoted in these papers). Each execution of the three-phase cycle of advertise-request-delivery is targeted to deliver information to only a single layer of peers. In RRG, each execution of the greeting-response-closure cycle is targeted to deliver information from all sources to all peers.
Using gossip to establish a random graph for information dissemination has been proposed in the papers [28, 67, 68]. Liang et al. have proposed the use of an on-demand tree for short-lived interactions . However, the proposed scheme does not maintain the spanning tree for a prolonged period of time; hence, no repair mechanism to cope with failures is possible. Chunkyspread  uses a simple controlled flooding mechanism over a random graph maintained by Swaplinks  for trees construction. It also uses multiple trees to react quickly to membership changes. However, the tree heights are not bounded in the protocol. In contrast, the information dissemination of RRG is strictly bounded to around 3 hops to support the real-time communication delay constraint of 400 ms . Carvalho et al. have proposed to probabilistically combine lazy push gossip and pure push gossip to obtain an emergent structure . The use of lazy push gossip, however, hinders its applicability to real-time communications as we discussed above.
Asynchronous gossip has been studied for other purposes as well [70, 71]. Boyd et al. have used asynchronous gossip to address the “averaging problem” in sensor networks . Ram et al. have studied asynchronous gossip for summing the component functions in a distributed multi-agent system . These protocols are not targeted for real-time N-to-N group communications.
Deb, Médard and Chour have studied N-to-N gossip with and without network coding in . Their primary contribution is to quantify the gain of network coding in a multiple-source scenario. They assume synchronous gossiping with only one gossip target per peer per phase. Their study is not applicable to real-time group communications.
Several studies [46, 47, 50, 63] have proposed the combining of the gossiping and structure-based approaches. These hybrid approaches combine the advantage of bandwidth efficiency in structure-based approaches with the churn-coping capability of gossiping approaches. Gossip is employed in the recovery of loss packets after the initial delivery by a structure-based approach. Gupta et al. have used gossip in a sub-tree topology to reduce the traffic load . Our RRG scheme can be extended to include these techniques.
Birman et al. , Gu et al.  and Lao et al.  have proposed the combining of the use of infrastructure and peer-to-peer approaches for real-time group communications. The objective of our paper is different from theirs. We focus on a pure peer-to-peer approach without any infrastructure support.
This paper is different from our original Globecom conference version of this paper  in three aspects. First, the protocol has been improved by the incorporation of a delayed response strategy. Second, additional performance evaluations are presented which include the traffic load performances under different scenarios and with user churn. Third, a prototype system running the protocol has been implemented and tested over the HKUST campus network and PlanetLab. In this paper, we identify the fact that the performance gain of our protocol is higher in networks with small delays.
III. Proposed N-to-N gossiping protocol
A. Protocol description
Our N-to-N gossiping protocol consists of n nodes, or peers, that operate in cycles. (The terms “peer” and “node” will be used interchangeably in this paper). Each cycle is initiated at fixed intervals and is identified by a global cycle ID. For simplicity, we assume that there is a global synchronization of the cycle ID and frame rate, and that this synchronization is achieved through the use of NTP. The use of a global cycle ID eliminates the need of a peer to manage the sequence numbering of sources individually and the need to transmit sequence numbers of individual chunks in a packet. Other mechanisms to achieve synchronization are possible but we assume that NTP is used so that we can focus on other aspects of our protocol. Each peer in a cycle can generate at most one information frame (e.g. a voice frame) to be distributed to the remaining n-1 peers through a multi-phase gossiping mechanism. The key to our protocol is the use of a synchronous global cycle ID and synchronous media generation. By “synchronous media generation” we mean that the packet generation rates are exactly the same for all active nodes. Most N-to-N real-time communication protocols in the literature have either assumed an asynchronous operation or have assumed a synchronous operation without addressing how this synchronicity is achieved. If using asynchronous operation, we would need to transmit and process individual sequence numbers as well as to perform frequency alignment across multiple streams. Also, the bundling of information from different sources into one transmitted packet cannot be done in as straightforward a manner - in our protocol, we simply need to bundle information frames with the same cycle ID.
To meet the real-time requirement, we limit the number of phases to 3. In other words, in each cycle, each peer will be engaged in a 3-phase gossip with a random set of other peers, regardless of the number of frames to be distributed. Successive cycles can overlap each other in time. For ease of illustration, we first describe our protocol as if the launch of cycles and phases by different peers are synchronized. But in our protocol this is, in fact, not the case and more details will be added later on.
If a node is already in possession of an information frame to be spread in a specific cycle, the node is called an infected node.
Phase 1 (the greeting phase)
In this phase, each node randomly selects a small number of nodes and sends a GREETING message to each of them. If a node is already infected when it launches its greeting phase, its GREETING message will contain information frames that it has already received. The selected nodes are called the “children” of the selecting node, which is called the “parent” of its selected children.
Phase 2 (the response phase)
During this phase, a node will send a RESPONSE message to all of its parents. If the child node is already infected at the beginning of this phase, the RESPONSE message contains all its received original frames (from different sources); if un-infected, the RESPONSE message contains no real data. In the example of Figure 2b, peer 5 and peer 1 are the parents of peer 8. Since peer 8 is infected by peer 1 during the greeting phase, peer 8 will send the information frame from peer 1 to peer 5 during the response phase.
Phase 3 (the closure phase)
In this phase, only an infected parent node will send a CLOSURE message, containing all of its received original frames (i.e. from different sources) to its children. Un-infected nodes will not send out anything. In the example shown in Figure 2c, peer 3 is a child of peer 2 and peer 8. Thus both nodes will send a CLOSURE message to peer 3. Peer 6 remains un-connected after all phases.
B. More details
1) Cycle Launches through NTP
2) Redundancy suppression and active peer list
Each gossip message that a peer sends out contains an Active Peer List (APL) that lists the source of each information frame that it has already received for that cycle with the actual information frame attached if appropriate. From the APL, the receiving peer extracts the information frames that it needs, and also avoids sending back to the sender information frames that the sender already has - this mechanism is referred to as redundancy suppression and its purpose is to reduce the total amount of traffic. Thus, in the APL of a message, some listed entries will have an information frame attached and some entries will not.
APL contains the complete contact information of a peer in 6 bytes - 4 bytes for the IP address and 2 bytes for the port number. The APL is included in every message that a peer transmits. The length of the APL is variable as the number of peers may vary. Therefore, APL is encoded in the Type-Length-Value (TLV) format, with 8-bit type and 16-bit length fields.
Any newly arrived peer is required to contact the bootstrapping point to acquire a list of peer contacts in the group, the current estimated community size (see Sec. III-B5 below) and the current cycle ID. The list of peer contacts does not need to be 100% correct because the new peer can learn the membership information from subsequent gossip, but it will impact the protocol performance. More details will be provided in Sec. IV-D. The newly arrived peer uses the current estimated size to determine its fanout value. Any peer in the group can be the bootstrapping point.
4) Dynamic group membership
A new peer joins the group through the bootstrapping point. Afterwards, its contact information is learned by other peers through the APL contained in the exchanged messages. Peers independently detect and remove a departed peer when that peer does not respond to a GREETING within a timeout.
5) Fanout estimation
In our protocol, each peer will independently decide how many peers to initiate gossip with based on its estimate of the current group size, a target information non-delivery probability, and the estimated non-delivery probability from Eq. 6 we derive in Sec. VI. We adopt and extend the gossip-based size estimation algorithm proposed by M Jelasity et al.  to support asynchronous operation. The details of the algorithm are beyond the scope of this paper. Hence, we omit the details of this algorithm in this paper.
IV. Performance evaluation
In this section, we present the analytical model for a key performance measure in the proposed protocol: the frame non-delivery probability as a function of fanout. We also compare, through simulation, the performance of the proposed RRG with that of the conventional push gossip approach in [43, 44] and the conventional push-pull gossip protocols in [54, 61]. The evaluation metric is the ratio of the non-delivery rate versus the traffic load. The effectiveness of the redundancy suppression is presented. Lastly, the impact of churn is studied.
A. Analytical model for frame non-delivery probability in redundancy reduced gossip
In this section, an analytical model to study the non-delivery rate of information frames in the proposed N-to-N gossiping protocol is developed. The analysis is based on a perfectly-synchronized Redundancy Reduced Gossip protocol. Although a real implementation, as discussed in Sec. III above, has to be asynchronous, the synchronous assumption allows us to obtain a closed-form formula that provides some useful insights into the design tradeoffs of the protocol.
For source node s, its family tree (Figure 4) consists of the following members: parents, children, co-parents of parents of s, step children, grandchildren, and siblings. “Parent” and “children” were defined in Sec. III-A. The definitions of the other members of the family tree are given below.
“ Grandchild ”: A child’s child.
“ Sibling ”: Two nodes having the same parent are called siblings of each other.
“ Co-parent ”: A node and s are called co-parents of each other if they share any node as a common child.
For a node to receive the broadcast message from source s in a cycle, the node must belong to the family tree of s. If not, the node will fail to receive the message. In the following we analyze the non-delivery probability p l for a given node.
We first define the fanout as b. b is also the number of children of s.
Let’s define the following random variables:
m p the number of parents of s.
m g the number of grandchildren of s.
m sb the number of sibling of s.
m c the number of co-parents with s.
m s the number of step children of s.
We also define the following probabilities:
p p the probability that a given node is a parent of s.
p c the probability that a given node is a co-parent to s.
p l the probability that the broadcast message of a cycle cannot be delivered to particular node.
Since each parent has b children, E(m g ), the expected total number of grandchildren m g , has order O(b2). Likewise, E(m sb ), the expected total number of siblings m sb , also has order O(b2).
for a large n. Note that the ratio of the standard deviation to mean, (np c (1-p c )) 0.5 /(n- 1) p c , is very small and hence the probability mass will center on the mean. Since each co-parent can have b children, the expected total number of step children has order O(b3). Among the six members—children, grandchildren, parent, co-parent, step children, and sibling—of the family tree, the number of step children is one order higher (in b) than the rest. This means that the size of the family tree of s is determined by the number of step children, which has the order O(b3). Note that O(b3) must assume the same order as n (i.e. O(n)). If the order is higher, many received frames will be duplicates and wasted; if lower, many nodes will not be part of the family tree and will not be able to receive the information frame from s. The above leads us to formulate b as b=c n 1/3 , where c is a constant, and to focus on c in our analysis below.
B. Frame non-delivery probability versus traffic load
Traffic load is another key performance metric in gossiping protocols. We define traffic load D as the actually measured average number of the same information frame that each peer receives. In other words, D is a measure of the redundancy or the bandwidth usage efficiency of the protocol. Without redundancy suppression, D should be in the order of c3. With redundancy suppression, D is found to be smaller. By first focusing on the measured D, we are disregarding the protocol overhead, which we will come back to examine later.
In the following, we compare the frame non-delivery probability p l as a function of traffic load D for the proposed gossip protocol and for the conventional push approach (used by most existing gossiping approaches [43, 44]) and the conventional push-pull approach [54, 61].
The model for the conventional push approach is as follows: a source node sends the information frame to b randomly selected nodes (phase 1), and each selected node will then push the frame to b other randomly selected nodes (phase 2), etc., with the selection of receiving nodes in each phase done independently. In addition, we include a buffer-map in the conventional approach to reduce redundancy [45–47]–a sending peer will avoid pushing to peers that are already marked in the buffer-map in messages that it has received, and will mark the buffer-map of peers that will receive the information frame in the outgoing message. Note that the buffer-map is more efficient than a list in this case because a list may contain almost all peers in the last phase of gossiping. We ignore the complexity involved in maintaining the mapping of buffer-map to peers in the presence of user churn.
The model for the conventional push-pull approach is as follows: each node randomly selects b nodes and initiates a two way push-pull with each selected node (phase 1 and 2). After that, unlike RRG, each node pushes all the possessed information frames to another b randomly selected nodes (phase 3). Phase 1 and 2 is independent of phase 3. The same buffer-map scheme is included as described in the conventional push approach above [45–47].
While closed-form formulae for the loss rate exist for many of the cases, a meaningful comparison of the loss rates requires that the comparison is done for the same traffic load. However, there is no closed-form formula to estimate the traffic load for protocols under various redundancy suppression schemes. Therefore, an event-driven simulation was developed for the comparison. The information frame is encoded at 8 kbps with a 20 ms sampling interval, i.e. 20 bytes per cycle per peer. We simulate the message propagation, node and link failure, network topology and link delay. As in common practice in simulating peer-to-peer algorithms in existing works [38, 46, 53], we do not simulate the network-level packet details (such as specific queuing delays) in order to make the simulation scalable. The link delay x ij from node i to node j is assumed to be a random variable, which is determined by a Weibull distribution Weibull(a,b) (a is the scale parameter and b=1.5 is the shape parameter of a long tail delay). The average number of active peers is less than 3.
Another important simulation issue is related to the clock accuracy. The proposed protocol uses NTP to acquire time information. Due to the inherent timing inaccuracy in NTP, the cycle launch time at every node is not perfectly synchronized. As stated in RFC1305 , the timing accuracy of NTP is in the range of a few tens of milliseconds. The cycle launch time of peers is modeled to be uniformly distributed within 50 ms. As discussed earlier, d s (the delay artificially added before sending out RESPONSE & CLOSURE) is set to 50 ms.
Traffic load in Figure 6 is measured in terms of average number of copies of an information frame received by each peer. But each message contains protocol headers, and the resulting overheads for the two compared protocols are different. When n=100 and the average number of active peers is less than 3, with c=2, the overhead in RRG is around 20% of total traffic. Most of the overhead is contributed by the APL, where membership information is carried and requires 6 bytes per peer. In the same settings, the overhead of the conventional push gossip and the conventional push-pull gossip are around 40% of total traffic. Most of the overhead is contributed by the buffer-map, which is at least 12 bytes per gossip message.
It is important to note that our protocol generates a smaller number of messages than the fully connected peer-to-peer overlay approach (Figure 1c) in N-to-N communication. The total number of messages in the fully connected overlay approach is n(n-1) while that of our protocol is 3bn. If n=100 with a target p l of 10 -2 , b is only 8 from Eq. 6. The number of messages of our protocol is only around 24% that of the fully connected overlay approach.
C. Effectiveness of redundancy suppression
D. Impact of churn of the proposed protocol
V. Prototype implementation and experimentation
The proposed protocol has been implemented in C on the Linux platform. An automated testing and measurement framework is also developed to support sound I/O from audio script to enable large-scale unattended testing for statistical measurement and, in the future, sound quality measurements. The prototype has been deployed on the HKUST campus network and on the PlanetLab testbed. In the following, we discuss several important aspects of the implementation.
A. Design and architecture
ThreadNetIn: it receives packets from a network interface card (NIC);
ThreadAudioInHW: it reads audio frames from a sound card buffer;
ThreadAudioInFile: it reads audio frames from an audio script file;
MainThread: it parses packets, implements voice activity detection and processes the gossiping logic;
ThreadAudioOutJitterBuffer: it implements a jitter buffer and mixes audio frames from multiple remote parties;
ThreadAudioOutHW: it writes frames to a sound card buffer;
ThreadAudioOutFile: it writes frames to a file and records statistical information for performance evaluation;
ThreadNetOut: it sends packets to the NIC;
ThreadTimerService: it registers and invokes callback events after a specified elapsed time.
ThreadAudioInHW and ThreadAudioInFile are pluggable and interchangeable, and so are ThreadAudioOutHW and ThreadAudioOutFile. The design enables either human or machine based mouth-to-ear voice measurement or large scale remote unattended testing without actual sound I/O. All modules are connected by the producer-consumer design pattern . Each consumer thread has a single work queue shared by possibly multiple producers.
The MainThread generates a GREETING, RESPONSE or CLOSURE according to the gossiping logic. The compiled GREETING, RESPONSE or CLOSURE encapsulated in UDP is sent to ThreadNetOut. Each frame is marked with a cycle ID, and each peer is synchronized using NTP .
Our protocol is implemented on Linux platforms. As Linux is not a real-time operating system, this makes real-time scheduling difficult. To solve the problem, we implement a clock-driven event dispatching module to handle all time-related events, such as periodic sampling of sound cards, periodic cycle launch, de-jittering, playback, scheduling of next wake-up instances, etc. The clock-driven dispatching module invokes events according to a local clock in order to prevent the timing errors caused by factors such as system load, imperfect usleep()  and multi-threading switching, etc.
Another issue is that overloaded machines could cause inaccurate latency measurements, but finding idle machines as difficult in the PlanetLab. We observe that for a 30-party conferencing experiment, a maximum of 4 instances of our prototype peer may run on an idle Dual-core Pentium 4 3.2 GHz (Pentium D 940) concurrently. To tackle this problem, we use PlanetLab CoMon  to identify less loaded machines (which are quite rare) and then use Sirius Calendar Service  to reserve their CPU time .
C. Network experimentation results
We use our prototype system to measure the frame non-delivery probability p l against c, where c3 = b3/n, to compare against the analytical result presented in Figure 5 in Sec. IV-A. We run two sets of experiments: one set over the HKUST campus LAN only and the second set over PlanetLab.
List of the planet lamb machines in use
Those experimental scenarios with larger delays (e.g. LAN A=100 and PlanetLab) do not benefit much from the bonus relay, as explained in Sec. IV-B and Figure 7. This explains why the results shown in Figures 12, 13 and 14 indicate that experimental scenarios with larger delay produce what are much closer to the analytical results than those with smaller delays.
Finally, we use the prototype system to study the network behavior when the network experiences membership changes. In the case of a 30% decrease in group size, the non-delivery probability converges in less than 70 cycles (1.4 sec). In the case of a 30% sudden increase in group size, the convergence time is much shorter as new peers are quickly learned by other peers through their GREETINGS and subsequent gossip exchanges. Results are omitted here as they are similar to Figure 9 in Sec. IV-C.
VI. Future direction
There can be several directions for future improvements on the proposed RRG protocol. One is to make it adaptive to the traffic conditions. The RRG proposed so far changes its connectivity each cycle. But if some links have low latency, we can give higher priority to those links when the topology is generated in the next cycle. This has been explored in other gossip-based protocols [28, 63, 68].
To see if the same idea can improve RRG, we have simulated an extension of RRG, named as RRG-adaptive, with some capabilities in being adaptive to traffic conditions. The simulation setup for the improved protocol is the same as in Sec. V-B. Peers use the latency information learned by past cycles and sort other peers into a descending order of latency as p = [p1, p2, …, pn-1]. The selections of new peers in the next cycle will follow the following PDF , where λ is the parameter tuning the degree of preference for choosing lower latency peers. Of course, other distributions can also be used as long as they prefer lower latency peers.
Recovery time of RRG/RRG-adaptive under user churn
Recovery time (sec)
RRG-adaptive λ =1
RRG-adaptive λ =10
Another area for further work is to consider ways for suppressing redundant information delivery to further improve bandwidth efficiency. There is also an increasing interest in allowing the sources to adjust their coding rates to match the network conditions and peer capabilities (e.g. multi-rate and adaptive coded video sources) [42, 83]. The idea is also applicable to RRG.
In this paper, we present a novel protocol, called Redundancy Reduced Gossip, for real-time N-to-N dynamic group communication. The protocol allows multiple sources to distribute information across a group with low latency, minimal membership maintenance, and without an assumption on the underlying network condition. We have shown that a considerably lower traffic load than conventional push gossip protocols and conventional push-pull gossip protocols can be achieved with the same probability of successful delivery. We have also shown that better performance can be achieved in networks with smaller delays and when a delay response strategy is added to RRG, which is an asynchronous gossip protocol. We have derived a mathematical model for the frame non-delivery probability and overhead of the protocol. This model provides important insights into the design of our protocol and has been used to evaluate the performance of other related protocols. A functional prototype system has been implemented in C on the Linux platform. Its design is described, and it has been used to evaluate the performance of our protocol over our campus network as well as over a less organized global network (PlanetLab). Our experiments demonstrate that our protocol can maintain a robust performance in real-world network environments.
This work was supported by Hong Kong Research Grant Council project 620410.
- IBM: IBM lotus sametime. [Online]. 2010. Available: http://www-01.ibm.com/software/lotus/sametime/ Available:Google Scholar
- Google: Google talk. [Online]. 2010. Available: http://www.google.com/talk/about.html Available:Google Scholar
- Skype Limited: Skype video call. [Online]. 2011. Available: http://www.skype.com/intl/en-us/features/allfeatures/video-call/ Available:Google Scholar
- Cisco: WebEx solutions. [Online]. 2010. Available: http://www.webex.com/about-webex/index.html Available:Google Scholar
- Cisco: Cisco TelePresense. [Online]. 2010. Available: http://www.cisco.com/web/go/telepresence Available:Google Scholar
- Polycom Inc: Polycom telepresence solutions. [Online]. 2010. Available: http://www.polycom.com/products/telepresence_video/ Available:Google Scholar
- Second Life: Second life. [Online]. 2010. Available: http://secondlife.com/whatis/ Available:Google Scholar
- Google: Google hangouts. [Online]. 2012. Available: http://www.google.com/+/learnmore/hangouts/ Available:Google Scholar
- EditGrid: EditGrid. [Online]. 2010. Available: http://www.editgrid.com Available:
- Valve Corporation: Counter-strike. [Online]. 2010. Available: http://www.valvesoftware.com/games/ Available:Google Scholar
- Blizzard Entertainment: World of warcraft. [Online]. 2010. Available: http://us.blizzard.com/en-us/games/wow/ Available:Google Scholar
- Twitter: Twitter. [Online]. 2010. Available: http://twitter.com/about Available:Google Scholar
- Facebook: Facebook. [Online]. 2010. Available: http://www.facebook.com/ Available:Google Scholar
- Five Minutes: Happy farm. [Online]. 2010. Available: http://apps.facebook.com/happyfarmers Available:Google Scholar
- Zynga: Café world. [Online]. 2010. Available: http://www.facebook.com/cafeworld? Available:Google Scholar
- ITU: One-Way Transmission Time. In ITU-T recommendation G.114. Geneva, Switzerland: International Telecommunication Union; 1993.Google Scholar
- Deering SE, Cheriton DR: Multicast routing in datagram internetworks and extended LANs. ACM Trans Comput Syst 1990, 8: 85–110. 10.1145/78952.78953View ArticleGoogle Scholar
- Lennox J, Schulzrinne H: A protocol for reliable decentralized conferencing. In NOSSDAV '03: Proceedings of the 13th International Workshop on Network and Operating Systems Support for Digital Audio and Video. Monterey, CA, USA; 2003:72–81.View ArticleGoogle Scholar
- Kundan S, Gautam N, Henning S: Centralized conferencing using SIP. In Proc of the 2nd IP-Telephony Workshop. New York, USA: New York City; 2001.Google Scholar
- Baset SA, Schulzrinne H: An analysis of the Skype peer-to-peer internet telephony protocol. In Proceedings IEEE INFOCOM 2006 25TH IEEE International Conference on Computer Communications. Los Alamitos, CA, USA: IEEE Computer Society Press; 2004:1–11.Google Scholar
- Yang-hua Chu SG, Rao S: Seshan and Hui Zhang, A case for end system multicast, Selected Areas in Communications. IEEE Journal on 2002, 20: 1456–1471.Google Scholar
- Li J, MutualCast: MutualCast: A Serverless Peer-to-Peer Multiparty Real-Time Audio Conferencing System. Multimedia and Expo, 2005. ICME 2005. IEEE International Conference; 2005:602–605.Google Scholar
- Irie M, Hyoudou K, Nakayama Y: Tree-based Mixing: A New Communication Model for Voice-Over-IP Conferencing Systems. Proceedings of 2005 Internet and Multimedia Systems, and Applications 2005, 353–358.Google Scholar
- Gu X, Wen Z, Yu PS, Shae ZY: Supporting multi-party voice-over-IP services with peer-to-peer stream processing. Proceedings of the 13th Annual ACM International Conference on Multimedia 2005, 303–306.View ArticleGoogle Scholar
- Hosseini M, Georganas ND: End System Multicast routing for multi-party videoconferencing applications. Comput Commun 2006, 29: 2046–2065. 10.1016/j.comcom.2005.12.009View ArticleGoogle Scholar
- Tseng S, Huang Y, Lin C: Genetic algorithm for delay- and degree-constrained multimedia broadcasting on overlay networks. Comput Commun 2006, 29: 3625–3632. 10.1016/j.comcom.2006.06.003View ArticleGoogle Scholar
- Akkus IE, Civanlar MR, Ozkasap O: Peer-to-peer multipoint video conferencing using layered video. Antalya: Image Processing, 2006 IEEE International Conference; 2006:3053–3056.Google Scholar
- Venkataraman V, Yoshida K, Francis P: Chunkyspread: Heterogeneous unstructured tree-based peer-to-peer multicast. Santa Barbara, CA: Network Protocols, 2006. ICNP'06. Proceedings of the 2006 14th IEEE International Conference; 2006:2–11.Google Scholar
- Luo C, Wang W, Tang J, Sun J, Li J: A Multiparty Videoconferencing System Over an Application-Level Multicast Protocol, Multimedia . IEEE Transactions on 2007, 9: 1621–1632.Google Scholar
- Fahmy S, Minseok K: Characterizing Overlay Multicast Networks and Their Costs. Networking, IEEE/ACM Transactions on 2007, 15: 373–386.View ArticleGoogle Scholar
- Banik SM, Radhakrishnan S, Sekharan CN: Multicast Routing with Delay and Delay Variation Constraints for Collaborative Applications on Overlay Networks, Parallel and Distributed Systems. IEEE Transactions on 2007, 18: 421–431.Google Scholar
- Lao L, Cui J, Gerla M, Chen S: A Scalable Overlay Multicast Architecture for Large-Scale Applications, Parallel and Distributed Systems. IEEE Transactions on 2007, 18: 449–459.Google Scholar
- Tu W, Sreenan CJ, Jia W: Worst-Case Delay Control in Multigroup Overlay Networks, Parallel and Distributed Systems. IEEE Transactions on 2007, 18: 1407–1419.Google Scholar
- Tseng S, Lin C, Huang Y: Ant colony-based algorithm for constructing broadcasting tree with degree and delay constraints. Expert Syst Appl 2008, 35: 1473–1481. 10.1016/j.eswa.2007.08.018View ArticleGoogle Scholar
- Zimmermann R, Liang K: Spatialized audio streaming for networked virtual environments. In Proceeding of the 16th ACM International Conference on Multimedia. Vancouver, British Columbia, Canada; 2008:299–308.View ArticleGoogle Scholar
- Nari S, Rabiee HR, Abedi A, Ghanbari M: An efficient algorithm for overlay multicast routing in videoconferencing applications. San Francisco, CA: Computer Communications and Networks, 2009. ICCCN 2009. Proceedings of 18th International Conference; 2009:1–6.Google Scholar
- Chia-Hui H, Kai-Wei K, Ho-Ting W: An application layer multi-source multicast with proactive route maintenance. Singapore: TENCON 2009–2009 IEEE Region 10 Conference; 2009:1–6.Google Scholar
- Liu Y: Delay Bounds of Chunk-Based Peer-to-Peer Video Streaming, Networking. IEEE/ACM Transactions on 2010, 18: 1195–1206.View ArticleGoogle Scholar
- Liang C, Zhao M, Liu Y: Optimal Bandwidth Sharing in Multiswarm Multiparty P2P Video-Conferencing Systems. Networking, IEEE/ACM Transactions 2011, 19: 1704–1716.View ArticleGoogle Scholar
- Chen X, Chen M, Li B, Zhao Y, Wu Y, Li J: Celerity: Towards low-delay multi-party conferencing over arbitrary network topologies. In Proceedings of the 21th International Workshop on Network and Operating Systems Support for Digital Audio and Video (ACM NOSSDAV 2011). Vancouver, Canada; 2011.Google Scholar
- Chen X, Chen M, Li B, Zhao Y, Wu Y, Li J: Celerity: A low-delay multi-party conferencing solution. In Proceedings of the 19th ACM international conference on Multimedia. Scottsdale, Arizona; 2011.Google Scholar
- Akkus IE, Ozkasap O, Civanlar MR: Peer-to-peer multipoint video conferencing with layered video. J Netw Comput Appl 2011, 34: 137–150. 10.1016/j.jnca.2010.08.006View ArticleGoogle Scholar
- Verma S, Wei Tsang O: Controlling gossip protocol infection pattern using adaptive fanout. In Distributed Computing Systems, 2005. ICDCS 2005. Columbus, OH: Proceedings. 25th IEEE International Conference; 2005:665–674.Google Scholar
- Georgiou C, Gilbert S, Guerraoui R, Kowalski DR: On the complexity of asynchronous gossip. Canada, Toronto: Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing; 2008:135–144.Google Scholar
- Ozkasap O, Xiao Z, Birman KP: Scalability of two reliable multicast protocols. Ithaca, NY: Cornell University; 1999.Google Scholar
- Tang C, Chang RN, Ward C: GoCast: Gossip-enhanced overlay multicast for fast and dependable group communication. In Dependable Systems and Networks, 2005. DSN 2005. Proceedings. International Conference; 2005:140–149.Google Scholar
- Melamed R, Keidar I: Araneola: A scalable reliable multicast system for dynamic environments. Cambridge, MA, USA: Network Computing and Applications, 2004. (NCA 2004). Proceedings. Third IEEE International Symposium; 2004:5–14.Google Scholar
- Zhang X, Liu J, Li B, Yum TSP: CoolStreaming/DONet: A Data-driven Overlay Network for Peer-to-Peer Live Media Streaming, INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies, vol 3. Miami: Proceedings IEEE; 2005:2102.Google Scholar
- Padmanabhan VN, Wang HJ, Chou PA, Sripanidkulchai K: Distributing streaming media content using cooperative networking. In Proceedings of the 12th International Workshop on Network and Operating Systems Support for Digital Audio and Video. NY, USA: ACM New York; 2002:177–186.Google Scholar
- Gupta I, Kermarrec AM, Ganesh AJ: Efficient and adaptive epidemic-style protocols for reliable and scalable multicast, Parallel and Distributed Systems. IEEE Transactions on 2006, 17: 593–605.Google Scholar
- Karp R, Schindelhauer C, Shenker S, Vocking B: Randomized rumor spreading. Redondo Beach, CA: Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium; 2000:565–574.Google Scholar
- Eugster PT, Guerraoui R, Kermarrec A, Massoulié L: From Epidemics to Distributed Computing. IEEE Computer 2004, 37: 60–67.View ArticleGoogle Scholar
- Kermarrec AM, Massoulie L, Ganesh AJ: Probabilistic reliable dissemination in large-scale systems, Parallel and Distributed Systems. IEEE Transactions on 2003, 14: 248–258.Google Scholar
- Demers A, Greene D, Hauser C, Irish W, Larson J, Shenker S, Sturgis H, Swinehart D, Terry D: Epidemic algorithms for replicated database maintenance. In Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing. NY, USA: ACM New York; 1987:1–12.View ArticleGoogle Scholar
- Chandra R, Ramasubramanian V, Birman K: Anonymous gossip: Improving multicast reliability in mobile ad-hoc networks. Distributed Computing Systems, 2001. 21st International Conference; 2001:275–283.Google Scholar
- Luo J, Eugster PT, Hubaux JP: Route driven gossip: Probabilistic reliable multicast in ad hoc networks, INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IIEEE Societies 2003, 3: 2229–2239. San Francisco, CA San Francisco, CAGoogle Scholar
- Haas ZJ, Halpern JY, Li L: Gossip-based ad hoc routing, Networking. IEEE/ACM Transactions on 2006, 14: 479–491.View ArticleGoogle Scholar
- Chun B, Culler D, Roscoe T, Bavier A, Peterson L, Wawrzoniak M, Bowman M: PlanetLab: an overlay testbed for broad-coverage services. SIGCOMM Comput. Commun Rev 2003, 33: 3–12. 10.1145/956993.956995View ArticleGoogle Scholar
- Fei H, Ravindran B, Jensen ED: RT-P2P: A scalable real-time peer-to-peer system with probabilistic timing assurances. Shanghai, China: Embedded and Ubiquitous Computing, 2008. EUC '08. IEEE/IFIP International Conference; 2008:97–103.Google Scholar
- Kai H, Ravindran B, Jensen ED: RTG-L: Dependably scheduling real-time distributable threads in large-scale, unreliable networks. Melbourne, Qld: Dependable Computing, 2007. PRDC 2007. 13th Pacific Rim International Symposium; 2007:314–321.Google Scholar
- Kempe D, Dobra A, Gehrke J: Gossip-based computation of aggregate information. Cambridge, MA, USA: Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium; 2003:482–491.Google Scholar
- Mujtaba MK: Push-pull gossiping for information sharing in peer-to-peer communities. In Proc. International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA). Las Vegas; 2003:1393–1399.Google Scholar
- Leitao J, Pereira J, Rodrigues L: Epidemic broadcast trees. Beijing, China: Reliable Distributed Systems, 2007. SRDS 2007. 26th IEEE International Symposium; 2007:301–310.Google Scholar
- Zhang X, Liu J: Gossip based streaming. In Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters. NY, USA: ACM New York; 2004:250–251.View ArticleGoogle Scholar
- Li HC, Clement A, Wong EL, Napper J, Roy I, Alvisi L, Dahlin M: BAR gossip. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association; 2006:191–204.Google Scholar
- Frey D, Guerraoui R, Kermarrec AM, Monod M: Boosting gossip for live streaming. Delft: Peer-to-Peer Computing (P2P), 2010 IEEE Tenth International Conference; 2010:1–10.Google Scholar
- Liang J, Ko SY, Gupta I, Nahrstedt K: MON: On-demand overlays for distributed system management. Edinburgh: Proceedings of USENIX WORLDS; 2005.View ArticleGoogle Scholar
- Carvalho N, Pereira J, Oliveira R, Rodrigues L: Emergent structure in unstructured epidemic multicast. Dependable Systems and Networks, 2007. DSN'07. 37th Annual IEEE/IFIP International Conference; 2007:481–490.Google Scholar
- Vishnumurthy V, Francis P: On overlay construction and random node selection in heterogeneous unstructured P2P networks. In Proceedings of IEEE INFOCOM’06. Barcelona, Spain; 2006.Google Scholar
- Boyd S, Ghosh A, Prabhakar B, Shah D: Randomized gossip algorithms, Information Theory. IEEE Transactions on 2006, 52: 2508–2530.MATHMathSciNetGoogle Scholar
- Ram SS, Nedic A, Veeravalli VV: Asynchronous gossip algorithms for stochastic optimization. Shanghai, China: Decision and Control, 2009 Held Jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedings of the 48th IEEE Conference on; 2009:3581–3586.Google Scholar
- Deb S, Medard M, Choute C: Algebraic gossip: a network coding approach to optimal multiple rumor mongering, Information Theory. IEEE Transactions on 2006, 52: 2486–2507.MATHMathSciNetGoogle Scholar
- Birman KP, Hayden M, Ozkasap O, Xiao Z, Budiu M, Minsky Y: Bimodal multicast. ACM Trans Comput Syst May, 1999, 17: 41–88. 10.1145/312203.312207View ArticleGoogle Scholar
- Luk VWH, Wong AKS, Ouyang RW, Lea CT: Gossip-based delay-sensitive N-to-N information dissemination protocol. IEEE Global Communications Conference IEEE GLOBECOM 2008; 2008:1–5.Google Scholar
- Mills D: RFC1305. Internet Engineering Task Force; 1992.Google Scholar
- Jelasity M, Montresor A, Babaoglu O: Gossip-based aggregation in large dynamic networks. ACM Trans Comput Syst 2005, 23: 219–252. 10.1145/1082469.1082470View ArticleGoogle Scholar
- Wikipedia: Producer consumer problem. [Online]. 2010. Available: http://en.wikipedia.org/wiki/Producer-consumer_problem Available:Google Scholar
- Jim M, Paul E: Usleep(3) - linux man page. [Online]. 2010. Available: http://linux.die.net/man/3/usleep Available:Google Scholar
- Park K, Pai VS: CoMon: a mostly-scalable monitoring system for PlanetLab. SIGOPS Oper Syst Rev 2006, 40: 65–74. 10.1145/1113361.1113374View ArticleGoogle Scholar
- PlanetLab: Sirius calendar service. [Online]. 2010. Available: https://www.planet-lab.org/db/sirius/index.php Available:Google Scholar
- PlanetLab: Sirius upgrade. [Online]. 2010. Available: http://www.planet-lab.org/node/5 Available:Google Scholar
- Liang C, Guo Y, Liu Y: Is random scheduling sufficient in p2p video streaming?. Beijing, China: Distributed Computing Systems, 2008. ICDCS'08. the 28th International Conference; 2008:53–60.Google Scholar
- Ponec M, Sengupta S, Chen M, Li J, Chou PA: Multi-rate peer-to-peer video conferencing: A distributed approach using scalable coding. In Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on; 2009:1406–1413.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.