automatically switch to other idle proxies (indeed, when our
proxies fail we see workers quickly switch away). Second, our
proxies are passive actors and do not engage themselves in
any behavior that is intrinsically objectionable; they do not
send spam email, they do not compromise hosts, nor do
they even contact worker bots asynchronously. Indeed, their
only function is to provide a conduit between worker bots
making requests and master servers providing responses.
Finally, where we do modify C&C messages in transit, these
actions themselves strictly reduce harm. Users who click on
spam altered by these changes will be directed to one of our
innocuous doppelganger Web sites. Unlike the sites
normally advertised by Storm, our sites do not infect users with
malware and do not collect user credit card information.
Thus, no user should receive more spam due to our involvement, but some users will receive spam that is less dangerous that it would otherwise be.
Needless to say, we encourage no one to recreate our
experiments without the utmost preparation and care.
Interacting with thousands of compromised machines
that are sending millions of spam messages is a very delicate procedure, and while we encourage other researchers
to build upon our work, we ask that these experiments only
be attempted by qualified professionals with no less forethought, legal consultation, or safeguards than those outlined here.
5. exPeRimeNTaL ReSuLTS
We now present the overall results of our rewriting experiment. We first describe the spam workload observed by our
C&C rewriting proxy. We then characterize the effects of filtering on the spam workload along the delivery path from
worker bots to user inboxes, as well as the number of users
who browse the advertised Web sites and act on the content
there.
Campaign datasets: Our study covers three spam campaigns summarized in Table 1. The “Pharmacy” campaign
is a 26-day sample ( 19 active days) of an ongoing Storm campaign advertising an online pharmacy. The “Postcard” and
“April Fool” campaigns are two distinct, serial instances
of self-propagation campaigns, which attempt to install
an executable on the user’s machine under the guise of
being postcard software. For each campaign, Figure 3
shows the number of messages per hour assigned to bots
for mailing.
Storm’s authors have shown great cunning in exploiting
the cultural and social expectations of users—hence the
April Fool campaign was rolled out for a limited run around
April 1. Our Web site was designed to mimic the earlier
Table 1. campaigns used in the experiment.
campaign Dates Workers
Pharmacy March 21–April 15 31,348
Postcard March 9–March 15 17,639
April Fool March 31–April 2 3,678
total
emails
347,590,389
83,665,479
38,651,124
469,906,992
102 commuNicaTioNS of The acm | sEPTEMBER 2009 | voL. 52 | No. 9
3
figure 3. Number of email messages assigned per hour for each
campaign.
Emails assigned per hour (millions)
2
2. 5
1
1. 5
0.5
Postcard
Pharmacy
April Fool
Mar 07 Mar 12 Mar 17 Mar 22 Mar 27 Apr 01 Apr 06 Apr 11 Apr 16
0
Date
Postcard campaign and thus our data probably does not perfectly reflect user behavior for this campaign, but the two are
similar enough in nature that we surmise that any impact is
small.
We began the experiment with eight proxy bots, of which
seven survived until the end. Figure 4 shows a timeline of the
proxy bot workload. The number of workers connected to
each proxy is roughly uniform across all proxies ( 23 worker
bots on average), but shows strong spikes corresponding to
new self-propagation campaigns. At peak, 539 worker bots
were connected to our proxies at the same time.
Most workers only connected to our proxies once: 78% of
the workers only connected to our proxies a single time, 92%
at most twice, and 99% at most five times. The most prolific
worker IP address, a host in an academic network in North
Carolina, USA, contacted our proxies 269 times; further
inspection identified this as a NAT egress point for 19 individual infections. Conversely, most workers do not connect
to more than one proxy: 81% of the workers only connected
to a single proxy, 12% to two, 3% to four, 4% connected to five
figure 4. Timeline of proxy bot workload.
600
Number of connected workers
500
400
300
200
100
Proxy 1
Proxy 2
Proxy 3
Proxy 4
Proxy 5
Proxy 6
Proxy 7
Proxy 8
0
Mar 24 Mar 29
Apr 02 Apr 06
Time