or more, and 90 worker bots connected to all of our proxies.
On average, worker bots remained connected for 40min,
although over 40% workers connected for less than a minute. The longest connection lasted almost 81 h.
The workers were instructed to send postcard spam to
83,665,479 addresses, of which 74,901,820 ( 89.53%) are unique.
The April Fool campaign targeted 38,651,124 addresses, of
which 36,909,792 ( 95.49%) are unique. Pharmacy spam targeted 347,590,389 addresses, of which 213,761,147 ( 61.50%)
spam Conversion pipeline: Conceptually, we break down
spam conversion into a pipeline with five “filtering” stages
Figure 5 illustrates this pipeline and shows the type of filtering at each stage. The pipeline starts with delivery lists
of target email addresses sent to worker bots (Stage A). For
a wide range of reasons, workers will successfully deliver
only a subset of their messages to an MTA (Stage B). At this
point, spam filters at the site correctly identify many messages as spam, and drop them or place them aside in a spam
folder. The remaining messages have survived the gauntlet
and appear in a user’s inbox as valid messages (Stage C).
Users may delete or otherwise ignore them, but some users
will act on the spam, click on the URL in the message, and
visit the advertised site (Stage D). These users may browse
the site, but only a fraction “convert” on the spam (Stage E)
by attempting to purchase products (pharmacy) or by downloading and running an executable (self-propagation).
We show the spam flow in two parts, “crawler” and “
converter,” to differentiate between real and masquerading
users. For example, the delivery lists given to workers contain
honeypot email addresses. Workers deliver spam to these
honeypots, which then use crawlers to access the sites referenced by the URL in the messages. Since we want to measure
the spam conversion rate for actual users, we separate out
the effects of automated processes like crawlers, including
only clicks we believe to be user-generated in our results.
Table 2 shows the effects of filtering at each stage of the
conversion pipeline for both the self-propagation and pharmaceutical campaigns. The number of targeted addresses
(A) is simply the total number of addresses on the delivery
lists received by the worker bots during the measurement
period, excluding the test addresses we injected.
We obtain an estimate of the number of messages delivered to a mail server (B) by relying on delivery reports generated by the workers. The number of messages delivered to a
user’s inbox (C) is a much harder value to estimate. We do
figure 5. The spam conversion pipeline.
User left site
not know what spam filtering, if any, is used by each mail
provider, and then by each user individually, and therefore
cannot reasonably estimate this number in total. It is possible, however, to determine this number for individual mail
providers or spam filters. The three mail providers and the
spam filtering appliance we used in this experiment had a
method for separating delivered mails into “junk” and inbox
categories. Table 3 gives the number of messages delivered
a user’s inbox for the free email providers, which together
accounted for about 16.5% of addresses targeted by Storm
(Table 3), as well as our department’s commercial spam
filtering appliance. It is important to note that these are
results from one spam campaign over a short period of time
and should not be used as measures of the relative effectiveness for each service. That said, we observe that the popular
Web mail providers all do a very a good job at filtering the
campaigns we observed, although it is clear they use different methods (e.g., Hotmail rejects most Storm spam at the
mail server level, while Gmail accepts a significant fraction
only to filter it later as junk).
The number of visits (D) is the number of accesses to our
emulated pharmacy and postcard sites, excluding any crawlers. We note that crawler requests came from a small fraction of hosts but accounted for the majority of all requests to
our sites. For the pharmacy site, for instance, of the 11,720
unique IP addresses seen accessing the site with a valid
unique identifier, only 10.2% were blacklisted as crawlers.
In contrast, 55.3% of all unique identifiers used in requests
originated from these crawlers. For all nonimage requests
made, 87.43% were made by blacklisted IP addresses.
The number of conversions (E) is the number of visits to
the purchase page of the pharmacy site, or the number of
executions of the fake self-propagation program.
Table 2. filtering at each stage of the spam conversion pipeline for the self-propagation and pharmacy
campaigns. Percentages refer to the conversion rate relative to Stage a.
D—user site visits