Table 3. Number of messages delivered to a user’s inbox as a
fraction of those injected for test accounts at free email providers
and a commercial spam filtering appliance. The test account for the
Barracuda appliance was not included in the Postcard campaign.
Spam filter
Gmail
Yahoo
hotmail
barracuda
Pharmacy
0.00683%
0.00173%
None
0.131%
Postcard
0.00176%
0.000542%
None
N/A
april fool
0.00226%
None
None
0.00826%
Our results for Storm spam campaigns show that the
spam conversion rate is quite low. For example, out of 350
million pharmacy campaign emails only 28 conversions
resulted (and no crawler ever completed a purchase so errors
in crawler filtering plays no role). However, a very low conversion rate does not necessary imply low revenue or profitability. We discuss the implications of the conversion rate on the
spam conversion proposition further in Section 8.
time-to-Click: The conversion pipeline shows what fraction
of spam ultimately resulted in visits to the advertised sites.
However, it does not reflect the latency between when the
spam was sent and when a user clicked on it. The longer it
takes users to act, the longer the scam hosting infrastructure will need to remain available to extract revenue from the
spam. 2 Put another way, how long does a spam-advertised
site need to be online to collect potential revenue?
Figure 6 shows the cumulative distribution of the “
time-to-click” for accesses to the pharmacy site. The time-to-click is the time from when spam is sent (when a proxy
forwards a spam workload to a worker bot) to when a user
“clicks” on the URL in the spam (when a host first accesses
the Web site). The graph shows three distributions for the
accesses by all users, the users who visited the purchase
page (“converters”), and the automated crawlers ( 14,716
such accesses).
1
figure 6. Time-to-click distributions for accesses to the pharmacy site.
0.8
Fraction of clicks
0.4
0.6
0.2
1 s
0
10 s 1 min
10 min 1 h 6 h
Time to click
1d 1w 1m
Crawlers
Users
Converters
104 commuNicaTioNS of The acm | sEPTEMBER 2009 | voL. 52 | No. 9
The user and crawler distributions show distinctly different behavior. Almost 30% of the crawler accesses are within
20s of worker bots sending spam. This behavior suggests
that these crawlers are configured to scan sites advertised
in spam immediately upon delivery. Another 10% of crawler
accesses have a time-to-click of 1 day, suggesting crawlers
configured to access spam-advertised sites periodically
in batches. In contrast, only 10% of the user population
accesses spam URLs immediately, and the remaining distribution is smooth without any distinct modes. The distributions for all users and users who “convert” are roughly
similar, suggesting little correlation between time-to-click
and whether a user visiting a site will convert. While most
user visits occur within the first 24 h, 10% of times-to-click
are a week to a month, indicating that advertised sites need
to be available for long durations to capture full revenue
potential.
6. effec TS of BLacKLiSTiNG
A major effect on the efficacy of spam delivery is the
employment by numerous ISPs of address-based blacklisting to reject email from hosts previously reported as sourcing spam. To assess the impact of blacklisting, during the
course of our experiments we monitored the Composite
Blocking List (CBL), 6 a blacklist source used by the operators of some of our institutions. At any given time the CBL
lists on the order of 4–6 million IP addresses that have
sent email to various spamtraps. We were able to monitor the CBL from March 21–April 2, 2008, from the start
of the pharmacy campaign until the end of the April Fool
campaign.
We downloaded the current CBL blacklist every half hour,
enabling us to determine which worker bots in our measurements were present on the list and how their arrival on the
list related to their botnet activity. Of 40,864 workers that
sent delivery reports, fully 81% appeared on the CBL. Of those
appearing at some point on the list, 77% were on the list
prior to our observing their receipt of spamming directives,
appearing first on the list 4. 4 days (median) earlier. Of those
not initially listed but then listed subsequently, the median
interval until listing was 1. 5 h, strongly suggesting that the
spamming activity we observed them being instructed to
conduct quickly led to their detection and blacklisting.
Of hosts never appearing on the list, more than 75% never
reported successful delivery of spam, indicating that the
reason for their lack of listing was simply their inability to
effectively annoy anyone.
We would expect that the impact of blacklisting on spam
delivery strongly depends on the domain targeted in a given
email, since some domains incorporate blacklist feeds such
as the CBL into their mailer operations and others do not.
To explore this effect, Figure 7 plots the per-domain delivery rate: the number of spam emails that workers reported
as successfully delivered to the domain divided by number
attempted to that domain. The x-axis shows the delivery rate
for spams sent by a worker prior to its appearance in the
CBL, and the y-axis shows the rate after its appearance in
the CBL. We limit the plot to the 10,879 domains to which
workers attempted to deliver at least 1,000 spams. We plot