Neighborhoods and bands: an analysis of the origins of spam
© Fonsecaet al.; licensee Springer. 2015
Received: 17 September 2014
Accepted: 29 March 2015
Published: 11 May 2015
Despite the continuous efforts to mitigate spam, the volume of such messages continues to grow and identifying spammers is still a challenge. Spam traffic analysis is an important tool in this context, allowing network administrators to understand the behavior of spammers, both as they obfuscate messages and try to hide inside the network. This work adds to that body of information by analyzing the sources of spam to understand to what extent they explain the traffic observed. Our results show that, in many cases, an Autonomous System (AS) represents an interesting neighborhood to observe, with most ASes falling into four basic types: heavy and light senders, which tend to have many or very few spammer machines respectively, frequent small offenders, where spammer machines appear every now and then but disappear in a short time, and conniving ASes, where most machines do not send spam, but a few are heavy, continuous senders. Not only that, but also by grouping machines based on the campaigns that they send together, we define the notion of SpamBands. Those bands identify groups of machines that are probably controlled by the same spammer, and our findings show that they often span multiple ASes. The identification of AS neighborhood types and SpamBands may simplify the combat against spam, focusing efforts at the sources as a whole, possibly improving blacklists by grouping machines found in a same AS or SpamBands.
KeywordsSpam traffic Autonomous system Network bad neighborhoods
In the last two decades, there has been a steady increase in the use of Internet, which led to an increase of the problems related to the sending of spam messages. In addition to the large volume of data generated, since the email service providers estimate that between 40% and 80% of electronic messages are spam, many times they are related to the propagation of phishing  and malware . Because of those factors, the losses caused by spam traffic are estimated in billions of dollars .
To try to counter those effects, the battle against spammers takes place on several fronts. For example, much has been done to develop filters based on message content, defining rules to identify patterns of obfuscation observed in spam messages . Besides that, in recent years, multiple efforts have focused on understanding spam traffic within the network. The goal in that case is to find elements that can be used to identify the machines that send the messages before they traverse the network and consume resources of mail servers at the destination. This work fits in this line, analyzing where are the machines used by spammers.
Recently, the term Internet Bad Neighborhoods was created to identify contiguous ranges of IP address space that contain a significant number of machines with unwanted behavior . The principle behind the original concept was that machines with similar (bad) behavior that shared an IP prefix would suggest they belong to a same network with problems. Later, the concept was extended to refer to network segments propagating unwanted traffic, regardless of the number of machines involved . The granularity chosen for those analysis was that of /24 address ranges.
Our analysis is based on similar principles, but we chose the level of Autonomous Systems (AS) to identify possible bad neighborhoods. Autonomous Systems, by nature, identify an IP address range under the control of a single entity responsible for defining usage policies, routing, and administrative procedures to be applied to all machines installed in that range. In this sense, two IP addresses belonging to the same AS would have a much greater chance of exhibiting similar behavior than two IP addresses on subnets with adjacent addresses (the IP space), but belonging to different ASes. After all, an AS with security flaws in its policies is at risk of becoming a potential Bad Neighborhood, since the probability of machines belonging to that AS being infected and starting to send spam is high. It should be pointed, though, that some ASes may be under split management; that could be better treated by considering BGP announced prefixes, but that information was not available in the collected data.
Our data contain spam traffic collected at various points around the globe, which gives us multiple vantage points over the network. By grouping the machines that generate the traffic based on their origin AS, we created a profile of the behavior of each Autonomous System observed during the experiment. Our results show that the majority of machines sending spam are concentrated in a few ASes and that only 15 of them are responsible for over 80% of the observed traffic. By using data mining techniques, we observed that there are similarities between some of them, and that it is possible to group them in four categories representing the AS from the point of view of their role in the distribution of spam. This finding is one of the main contributions of this work, since, until then, it was common to assume that there would be only two types of generators of spam: “light” and “heavy” transmitters. Our characterization indicates that in addition to spam-free and bad neighborhoods (where bad behavior — spamming — is quite common), there are also good neighborhoods, where such behavior is not the norm, but appears from time to time, usually rapidly disappearing shortly after its appearance in one or another machine, and conniving neighborhoods, where misbehaving machines were not widespread, but limited to only a few hosts, which tend, however, to be heavy senders.
2 Related work
This work analyzes spam based on its behavior as seen “inside the network”, not based on data from destination mail servers. In that sense, it relates to the works of Ramachandran and Feamster , one of the first to study spam from network-level features. Although we consider such features, since we have access to the spam content, we combine different views. Duan, Gopalan and Yuan  have recently provided a similar analysis, although based on a single point of observation (a large university campus).
The definition of Bad Neighborhoods, mentioned in the introduction, is due to van Wanrooij and Pras, who proposed the concept as an extension of the use of blacklists in the spam detection . In their study, each 24-bit IP prefix (/24) would be a neighborhood and bad neighborhoods would be those with a large number of machines sending spam. Moreira Moura et al.  focused on the analysis of these neighborhoods and extended the definition to include IP networks with few transmitters, but with a high volume of traffic, following the classification of “heavy” and “light” transmitters previously proposed by Pathak, Hu and Mao . In our work, we observe that each AS can be seen as a neighborhood, because an autonomous system naturally defines an area with similar machines, since there is a unique management for the whole AS and a common routing policy for all machines.
Many studies analyze spam traffic using messages collected at the destination mail servers. Gomes et al.  showed features that can be used to separate legitimate messages from spam messages, using data collected from only one specific point of the network. In this work, spam messages were collected by low-interaction honeypots installed in 10 different countries and located in transit networks. This provided a more global view of spam traffic, offering a different perspective.
Kokkodis and Faloutsos  showed results that indicate that the activities of botnets are scattered in the IP address space, reducing the effectiveness of anti-spam filters based on addresses and hindering the work of network administrators. Our work, despite confirming the existence of spammers in a very large number of networks, shows that most of the spam messages come from a small number of ASes, a result that can be used in the development of new techniques for spam detection, as in the design of initiatives to act against such sources.
Some of our analysis is based on the concept of spam campaigns. Our definition is based on the identification of frequent patterns in the content, using data mining techniques . Other approaches have been proposed, like the use of regular expressions . Our approach fits better with our processing pipeline, where multiple data mining algorithms are applied to derive different views, such as those in this paper.
This paper is based on previous work, so far available only in Portuguese. In a first paper , we performed a detailed analysis of spam messages collected over three months around the world to observe Bad Neighborhoods. With the same dataset, we developed the concept of SpamBands , another way to analyze the origin of spammers. (All the major concepts from those papers are included here, to provide a complete source in English). In the current work we extend our analysis of both concepts to cover data from approximately one year, and for the first time we use both concepts, Neighborhoods and SpamBands, to study the relationship between them. That allowed us to identify new patterns, such as the strong correlation between IP addresses in SpamBands and bad neighborhoods, and a topological relationship among spammers, since the IP addresses from a SpamBand usually come from just a few ASes. We hope that our findings can help drive the spam community’s efforts to combat spammers closer to their origin.
Three aspects of our methodology deserve attention: the collection architecture, and our techniques to identify spam campaigns, and to define SpamBands. They are presented next.
3.1 Collection infrastructure
The dataset used in this work was collected using twelve low-interaction honeypots  installed in ten different country codes: two in Brazil, two in the United States and one in each of Argentina, Australia, Austria, Ecuador, Netherlands, Norway, Taiwan and Uruguay. That means we had collectors present in four different continents, allowing the study to have a global view of spam traffic. By doing that, we avoided the problem of location bias, which may be present in several studies in the literature, whose data often come from a single collection point. Furthermore, none of the honeypots used in the analysis showed any signs of having been subjected to any form of attack.
The honeypots used in this paper are machines that simulate machines of interest to spammers, such as open SMTP mail relays and HTTP and SOCKS open proxies. Their goal is to lure spammers to identify them as vulnerable machines and use them to try to deliver spam messages. In practice, the honeypots do not deliver spam messages to the intended recipients; instead, they are stored locally and periodically collected to a central storage. The behavior of honeypots, however, is such that it makes the spammer believe that the delivery was successful. That is corroborated by the fact that each most machines continues to abuse the honeypots for all the collection period.
It should be noted that our analysis is guided by the traffic that was directed to our honeypots. There may be spammers that do not make use of proxies/relays to deliver their messages, and those are not considered in this analysis. However, is highly unlikely that a heavy spammer, using a dedicated server farm, would remain in activity without such a technique: it would be easily identified by black lists and blocked, since it uses few origin IP addresses. On the other hand, if a botnet delivers spam directly to the target mail servers all the time, we would not see it in our data.
Along with each message received, additional information is collected and stored by the system. This information includes the protocol used by the spammer to connect to the honeypot, (SMTP, HTTP or SOCKS), the network prefix and AS of origin, the status of the source IP in blacklists like Spamhaus XBL and PBL at the moment each message was delivered, among others. All that is obtained at the time the message is received, so that we have a snapshot of things as they were at the moment the spammer tried to send each message. Thus, our analysis considers the information available at the time of the transmission and not during a later query, which could cause error. That is essential, for example, for the analysis of black list contents, which might change between the time of collection and analysis.
Later, during the analysis, some ASes deserved further study. In those cases, based on their AS numbers, we gathered data available on the Internet to get more details about their activities. Based on the activities that were identified during that search, we classified the ASes as providers (general, DSL, corporative), hosting/co-location services, etc.
3.2 Spam campaigns and spam bands
To better understand the behavior of spammers, we used the concept of spam campaigns. A campaign is a set of messages that share a common goal (similar content) and a common dissemination strategy . We used the FPCluster algorithm to group messages based on their various attributes and to identify the obfuscation strategies used. That algorithm builds a frequent pattern tree, which is then used to extract the message clustering patterns, which in turn identify the campaigns [12,17].
Through the identification of campaigns, we detect the influence of each orchestrated campaign on the spam traffic collected, as well as the emergence of new IP addresses that join a given campaign. Based on those observations, Fazzion et al.  developed a method that can identify groups of transmitters that are correlated, called SpamBands. Since that work was published in Portuguese, the method is described here for completeness.
From G, we can define a SpamBand as a dense sub-graph that can be obtained by several clustering algorithms in graphs in the literature which can be quite complex and hard to calibrate . Our strategy is more simple and interactive. Initially, each SpamBand is a connected component.
In some cases, however, one IP address may be found connected to more than one such sub-graph. That may be due to IP address reassignment, or use of NAT. To handle that, in a second moment, we evaluate those cases, which can require the split of certain connected components in order to isolate subgraphs with higher density.
The process to identify SpamBands is presented in Algorithm 1. The algorithm receives three parameters: the graph (G), the minimum threshold betweenness to be considered (threshold_bt) and the maximum number of IP addresses that can be removed in order to split a connected component (threshold_ips).
The first step is to determine the connected components in G which constitute the initial approximation of the SpamBands. Next, we identify dense sub-graphs in each connected component exploring the betweenness concept, which measures the centrality degree of a node in the graph. This metric indicates the number of shortest paths among all pair of nodes in the graph that pass through a given node. Our premise is that when some nodes have a high value of betweenness, beyond what would be expected for a strongly connected graph, chances are that those nodes are connecting two (or more) sub-graphs which are, themselves, internally dense. Thus, by removing those nodes, we are emphasizing the separation of those internally dense sub-graphs. This removal is based on the parameters threshold_bt, which is the lower bound of betweenness that a node may have in order to be removed, and threshold_ips, which defines a maximum threshold of the number of nodes that can be removed in order to split a component. Algorithm 1 initially verifies which nodes satisfy the betweenness threshold in each connected component and next verifies if their removal does not lead very small graphs. If it is possible to remove the nodes, each resultant component is inserted in S. If not, the current component is inserted in S. The algorithm returns the set S which holds all SpamBands.
4 Collected data
Our analysis considers approximately one year of collection, from May 9, 2012, until March 31, 2013, resulting in nearly four billion messages (14 TB). By analyzing a large period, we avoid any impact due to an atypical behavior, which could occur in a short period of time.
Global vision of the data
Messages (x10 6)
The number of IP addresses using SOCKS and HTTP protocol is much smaller when compared to the number of IP addresses that used the SMTP protocol. Nevertheless, the number of messages sent using HTTP and SOCKS is larger. This shows that there is not a direct relationship between the number of machines and the number of messages sent. This division is a sign of the differentiation between spammers: some adopt strategies based on high volume over a certain protocol, while others may send lower volumes, using more machines, over another protocol. In fact, during our analysis we will see that there are more factors to be considered.
5 Neighborhood analysis
In this work, we advocate that ASes can be used for the identification of the limits of the neighborhoods, instead of /24 prefixes, as used in the original definition . That provides a more natural aggregation of addresses, given that fixed-length prefixes are not adequate for all cases.
To show that, in Table 1 we observe that spam messages come from many different networks, since 3,226 distinct autonomous systems appeared in the collected data. It is interesting to notice that most of those spam messages were sent by a very small number of autonomous systems, being fifty of them responsible for over 85% of all traffic. Thus, analyzing the behavior of spam at the source can direct the efforts to fight spam as it can identify which are the neighborhoods that have worst behavior, and, consequently, that are more likely to send spam messages.
5.1 Distribution of IP addresses in autonomous systems
Number of IP addresses observed per AS
IP addr. per AS (x)
#Msgs (x10 6 )
x = 1
x ≥ 100
Those ASes that have fewer than 10 machines that send spam, account for over 83% of the total. Nevertheless, they send only 7.66% of the messages and correspond in number of machines to 1.7% of the total. Thus, we believe that in terms of neighborhoods, these AS are not characterized as bad neighborhoods, but that their security policies are being implemented correctly, because of the small number of spamming IP addresses and the low volume of traffic generated by them. On the other hand, 95 autonomous systems (2.94%) have more than 319,000 IP addresses in the dataset (97.41%) and are responsible for 71% of traffic from spam, which corresponds to almost 3 billion messages. Those neighborhoods show bad behavior, possibly due to weak security policies. Thus, direct efforts to understand and improve the behavior of those networks might have impact on the overall traffic.
Figure 2 also shows the distribution of IP addresses present in each blacklist. The two black lists considered here are XBL, which lists IP addresses detected as infected, and PBL, which lists IP ranges declared by ISPs as being used for dynamic hosts — which should not send mail directly. Finally, we consider IPs that were not found in any of the blacklists considered (No BL). There is a very small number of Autonomous Systems that do not have IP addresses in XBL, about 15%, as we can see in Figure 2(b). In addition, approximately 60% of the ASes have all their IP addresses in XBL. This result makes us believe that most IP addresses are detected by the XBL, but what happens is that a good portion of the ASes listed there (49%) have only one spamming IP address (therefore, 100% of their addresses are in the XBL).
Overview of blacklists
#Messages (10 6 )
5.2 Analysis of neighborhoods with higher transmission power
15 most important autonomous systems
Msgs (10 6 )
IP No BL
Some autonomous systems (10297 and 29802) have similar characteristics in virtually all aspects. They have a small number of IP addresses in our dataset, most of them using SOCKS and HTTP protocols to send spam, and do not belong to any blacklist. AS 2497 is also very similar, despite having a larger number of machines. AS 4725, in turn, differs only by having a large number of IP addresses in PBL blacklist. The machines of those neighborhoods behave like dedicated servers used to send spam: they use SOCKS and HTTP protocols, meaning they do not contact any mail host directly (only through proxies), each sends a large number of messages, and most of them are not in any blacklist.
In contrast, we find some neighborhoods with completely different characteristics, such as Autonomous Systems 3462 and 4134. Both have more than 100,000 IP addresses in our dataset, the vast majority of machines observed used the SMTP protocol to send spam and most of them were in some blacklist. In addition, AS 4134 has very striking features, with more than 99% of their IP addresses sending spam messages using the SMTP protocol and about 17,000 of them in XBL.
5.3 Grouping of autonomous systems
Because of the evidence mentioned in Section 5.2, we looked for a way to group the AS observed and classify them according to their characteristics. For this, we use the X-means clustering algorithm , considering the characteristics of each neighborhood as attributes. The algorithm has the quality of automatically setting the optimum number of clusters to use, unlike other clustering algorithms.
To perform the clustering, we used as features the characteristics that better represent the Autonomous Systems in our analysis. The attributes carry information such as number of IP addresses observed, number of messages per day, percentage of the IP addresses in blacklists, percentage of IP addresses using each protocol, and the average number of messages sent per IP address. Those attributes proved to be a good set to identify the neighborhoods, because they define the major elements of behavior that machines on those networks can present.
Features of each group
Msgs (x10 6)
No. IP Addresses
Msgs/IP (x10 3)
If we consider the neighborhoods that sent more spam messages, studied in Section 5.2, we see that the clustering placed ASes with similar characteristics in the same group, and separated those with different behaviors. Autonomous Systems 10297, 29802, and 2497, whose machines behave like dedicated servers, ended up in group 2, responsible for most of the spam traffic, even though having fewer IP addresses. Moreover, that group has few machines in blacklists, which is a necessary feature for machines that send a large volume of messages — otherwise they would not be effective.
Most machines from Autonomous Systems 3462 and 4134 behave like bots and those two neighborhoods are part of the Group 4. That group includes ASes that have a large number of IP addresses with most of them in blacklists. It is also observed that most of the IP addresses in that group sent a small amount of messages.
By analyzing the 15 neighborhoods highlighted in Section 5.2, we found that none of them are in groups 1 or 3, as can be seen in Table 4. This result was already expected, since the ASes in group 1 have a very small number of IP addresses and those from group 3 have few machines that send spam and are responsible for few spam messages. Thus, the neighborhoods that send more spam were allocated to the other two groups: ASes that have a lot of IP addresses and those that send a large amount of spam messages.
We believe these results may be used in at least two important ways: to help guide policies used by different network managers in the way they treat data from ASes known to fall into a certain class, and to help the network community to identify organizations that may be in need of some orientation on how to handle their security (those with a large number of low volume spamming IPs), or those that may require some pressure to act against server-heady spammers that may be among their clients.
5.3.1 5.3.1 Group 1
The main characteristic of the ASes of this group is the small number of spamming machines, as we can see in Figure 3(b). Almost 60% of the Autonomous Systems here have only a single IP address in the dataset and none of them have more than one hundred IP addresses. This explains why that group, even encompassing 64% of the AS, is responsible for only 9.5% of the spam traffic generated. Furthermore, most of the IP addresses in this group are in XBL blacklist, which characterizes infected machines, probably belonging to botnets.
5.3.2 5.3.2 Group 2
The Autonomous Systems in this group also have a small number of spamming IP addresses — 57% of them had only one IP address in the dataset. Moreover, a very small percentage of neighborhoods here (2%) have more than one hundred machines. However, even with a small number of IP addresses, the average number of spam messages sent by each of them is very large, more than 162 thousand, as can be seen in Table 5. Those features (few machines, with heavy spam traffic) suggest that most of the ASes here house machines that act as dedicated servers to send spam, probably with the connivance of the network administrators. In our opinion, an unwanted bot that would start behaving that way would not go unnoticed by a network administrator that did not accept such practice, and it would not remain limited to a few machines if the network administrator was careless enough not to bother about it. One final interesting aspect is that, in this group, most of the IP addresses are not in any blacklist. Considering the volume of traffic they generate, that would only be possible if they consistently abuse intermediary machines to hide from blacklist detection.
As mentioned earlier, ASes 10297, 29802 and 2497 were assigned to this group. Like others in the group, that were studied, those AS are characterized by offering hosting and co-location services, which would fit the profile just described.
5.3.3 5.3.3 Group 3
The graph in Figure 7(b) shows a similar behavior to that seen in groups 1 and 2, but the number of ASes with only one IP address is lower, just under 40%. What marks this group is the large number of IP addresses that use the SMTP protocol, over 99% of them, surpassing any other group. In addition, about 64% of the machines in this group are in XBL. This suggests the presence of bots, but the low number of IP addresses suggests that there are fewer compromised machines in those AS.
5.3.4 5.3.4 Group 4
This group contains the ASes with the larger numbers of machines observed, as can be seen in Figure 9(b), with over 20% of neighborhoods with over 1,000 IP addresses, in which some of them have more than 100,000 machines. Thus, even accounting for much of the spam traffic, the number of spam messages per IP address is the lowest among the groups, only 3,000. Moreover, the great majority of the IP address are in blacklists and use the SMTP protocol. For all this, we have strong evidence that many of the machines belonging to this group are part of botnets. Because of the large number of machines in this situation, these AS are classified of bad neighborhoods, where, apparently, management policies and network maintenance are not able to prevent the proliferation of infected machines.
ASes 3462 and 4134, which are part of this group, have been classified as ISPs with DSL networks. This suggests that the composition of this group is predominantly domestic users machines infected by some type of malware.
6 SpamBands analysis
Figure 11(b) shows a linear regression of the number of SpamBands per day for each honeypot. The linear trends reveal lines with low inclination (almost constant) adding to the impression that the variation observed in Figure 11(a) is regular and is due to some kind of obfuscation. Another interesting result is about honeypot EC-01. That honeypot was attacked by more SpamBands than any other, although no clear reason for that was found.
6.1 Relationship between SpamBands and ASes
6.1.1 6.1.1 SpamBands activities in different neighborhoods
Furthermore, most of SpamBands that contain AS from group 2 usually use the HTTP or SOCKS protocol. This was expected, once group 2 seems to have a lot of dedicated server machines to send spam. On the other hand, the SpamBands with AS in group 4 use the SMTP protocol. This result was also expected because most machines in these neighborhoods seems to belong to botnets.
6.1.2 6.1.2 SpamBands clustering
In this section we analyze the clustering coefficient inside the SpamBands to verify if IP addresses SpamBands interact more with other IP addresses in their AS than with IP addresses from others neighborhoods. The internal clustering coefficient (ICC) of a SpamBand is the average of the clustering coefficient in each AS considering only the internal connections, i.e., connections among IP addresses that belong to a same Autonomous System. On the other hand, the external clustering coefficient (ECC) of a SpamBand takes the average of the clustering coefficient in each AS considering only the external connections, i.e., connections among IP addresses of different Autonomous Systems.
Several efforts are under way to combat spam, but this task has been made difficult due to the technical sophistication of spammers. This paper tries to shed some light on the sources of spam messages, to help the development of techniques and policies to fight spam at its origin. Our results show that, although spam messages are being sent from various networks, most of the traffic is concentrated in a few Autonomous Systems, and that can be used to identify spam sources and fight them. Moreover, we grouped ASes into four categories based on their spam dissemination behavior. Those groups shown that we can identify good and bad neighborhoods, some with many infected machines, others with just a few on-and-off senders that get shut down quickly, other which are conniving with a few heavy spammers.
By identifying machines that participate together in a spam campaign (SpamBands), we observed that most campaigns originated from neighborhoods of a single type, or may include hosts in the two types of heavy sending neighborhoods at the same time. All that can be used to identify major sources of spam to help stop that kind of traffic.
As future work, we plan to conduct further analysis on each of the neighborhood categories found to better understand the differences among them. We also intend to better understand the behavior of the category considered good neighborhoods and check whether security policies used to define the behavior of those autonomous systems can serve as a model to others.
This work was partially funded by NIC.Br, Fapemig, CAPES, CNPq and InWeb.
- Orman H (2013) The compleat story of phish. Int Comput IEEE 17(1): 87–91.MathSciNetView ArticleGoogle Scholar
- Newman MEJ, Forrest S, Balthrop J (2002) Email networks and the spread of computer viruses. Phys Rev E 66: 035101.View ArticleGoogle Scholar
- Sipior JC, Ward BT, Bonner PG (2004) Should spam be on the menu?Commun ACM 47(6): 59–63.View ArticleGoogle Scholar
- Guerra PHC, Guedes D, Wagner Meira J, Hoepers C, Chaves MHPC, Steding-Jessen K (2010) Exploring the spam arms race to characterize spam evolution In: Proceedings of the 7th Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), Redmond, WA.Google Scholar
- van Wanrooij W, Pras A (2010) Filtering spam from bad neighborhoods. Int J Netw Manag 20(6): 433–444.View ArticleGoogle Scholar
- Moreira Moura GC, Sadre R, Pras A (2011) Internet bad neighborhoods: the spam case. In: Festor O Lupu E (eds)7th International Conference on Network and Services Management (CNSM 2011), Paris, France, 1–8.. IEEE Communications Society, USA.Google Scholar
- Ramachandran A, Feamster N (2006) Understanding the Network-Level Behavior of Spammers. SIGCOMM Comput Commun Rev 36(4): 291–302.View ArticleGoogle Scholar
- Duan Z, Gopalan K, Yuan X (2011) An empirical study of behavioral characteristics of spammers: Findings and implications. Comput Commun 34(14): 1764–1776.View ArticleGoogle Scholar
- Pathak A, Hu YC, Mao ZM (2008) Peeking into spammer behavior from a unique vantage point In: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats. LEET’08, 3–139.. USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=1387709.1387712.
- Gomes LH, Almeida RB, Bettencourt LMA, Almeida V, Almeida JM (2005) Comparative Graph Theoretical Characterization of Networks of Spam and Legitimate Email In: Proceedings of the Second Conference on Email and Anti-Spam - CEAS 2005.. CEAS, Stanford, CA, USA.Google Scholar
- Kokkodis M, Faloutsos M (2009) Spamming botnets: Are we losing the war? In: Proceedings of the 6th Conference on E-mail and Anti-spam (CEAS), Mountain View, CA.Google Scholar
- Guerra PHC, Pires DEV, Guedes D, Wagner Meira J, Hoepers C, Steding-Jessen K (2008) A campaign-based characterization of spamming strategies In: Proceedings of the 5th Conference on E-mail and Anti-spam (CEAS), Mountain View, CA.Google Scholar
- Xie Y, Yu F, Achan K, Panigrahy R, Hulten G, Osipkov I (2008) Spamming botnets: signatures and characteristics. In: Bahl V, Wetherall D, Savage S, Stoica I (eds)SIGCOMM, 171–182.. ACM, Seatle, WA.Google Scholar
- Fonseca O, Las-Casas PHB, Fazzion E, Guedes D, Jr. WM, Hoepers C, Steding-Jessen K, Chaves MHP (2014) Vizinhanças ou condomínios: uma análise da origem de spams com base na organização de sistemas autônomos In: Brazilian Symposium on Computer Networks and Distributed Systems (SBRC) (In Portuguese), Florianópolis, Brazil.Google Scholar
- Fazzion E, Las-Casas PHB, Fonseca O, Guedes D, Jr. WM, Hoepers C, Steding-Jessen K, Chaves MHP (2014) Spambands: Uma metodologia para identificação de fontes de spam agindo sob uma coordenação In: Brazilian Symposium on Information Security and Computer Systems (SBSeg) (In Portuguese), Belo Horizonte, Brazil.Google Scholar
- Steding-jessen K, Vijaykumar NL, Montes A (2008) Using Low-Interaction Honeypots to Study the Abuse of Open Proxies to Send Spam. INFOCOMP J Comput Sci 7(1): 45–53.Google Scholar
- Totti LC, Moreira REA, Fazzion E, Fonseca O, Wagner Meira J, Guedes D, Hoepers C, Steding-Jessen K, Chaves MHP (2012) Impacto da Evolução Temporal na Detecção de Spammers na Rede de Origem In: SBRC 2012, Ouro Preto, Brasil.Google Scholar
- Almeida H, Guedes D, Meira W, Zaki MJ (2011) Is there a best quality metric for graph clusters? In: Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I, 44–59, Athens, Greece.Google Scholar
- Pelleg D, Moore AW (2000) X-means: Extending k-means with efficient estimation of the number of clusters In: ICML, 727–734, San Francisco, CA.Google Scholar
- John JP, Moshchuk A, Gribble SD, Krishnamurthy A (2009) Studying Spamming Botnets Using Botlab In: 6th USENIX Symp. on Networked Systems Design and Implementation, Boston, EUA.Google Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.