how much is filtered by popular antispam solutions, and,
most importantly, how many users “click-through” to the
site being advertised (response rate) and how many of those
progress to a “sale” or “infection” (conversion rate).
The remainder of this paper is structured as follows.
Section 2 describes the economic basis for spam and
reviews prior research in this area. Section 4 describes our
experimental methodology for botnet infiltration. Section
5 describes our spam filtering and conversion results,
Section 6 analyzes the effects of blacklisting on spam delivery, and Section 7 analyzes the possible influences on spam
responses. We synthesize our findings in Section 8 and
conclude.
2. BacKGRouND
Direct marketing has a rich history, dating back to the nineteenth century distribution of the first mail-order catalogs.
What makes direct marketing so appealing is that one can
directly measure its return on investment. For example,
the Direct Mail Association reports that direct mail sales
campaigns produce a response rate of 2.15% on average. 4
Meanwhile, rough estimates of direct mail cost per mille—the
cost to address, produce and deliver materials to a thousand
targets—range between $250 and $1000. Thus, following
these estimates it might cost $250,000 to send out a million
solicitations, which might then produce 21,500 responses.
The cost of developing these prospects (roughly $12 each)
can be directly computed and, assuming each prospect
completes a sale of an average value, one can balance this
revenue directly against the marketing costs to determine
the profitability of the campaign. As long as the product of
the conversion rate and the marginal profit per sale exceeds
the marginal delivery cost, the campaign is profitable.
Given this underlying value proposition, it is not at all
surprising that bulk direct email marketing emerged very
quickly after email itself. The marginal cost to send an email
is tiny and, thus, an email-based campaign can be profitable
even when the conversion rate is negligible. Unfortunately, a
perverse byproduct of this dynamic is that sending as much
spam as possible is likely to maximize profit. 8
While spam has long been understood to be an economic
problem, it is only recently that there has been significant
effort in modeling spam economics and understanding the
value proposition from the spammer’s point of view. Rarely
do spammers talk about financial aspects of their activities
themselves, though such accounts do exist. 10, 13 Judge et al.
speculate that response rates as low as 0.000001 are sufficient to maintain profitability. 12
However, the work that is most closely related to our own
are the several papers concerning “Stock Spam.” 5, 7, 9 Stock
spam refers to the practice of sending positive “touts” for
a low-volume security in order to manipulate its price and
thereby profit on an existing position in the stock. What dis-tinguishes stock spam is that it is monetized through price
manipulation and not via a sale. Consequently, it is not necessary to measure the conversion rate to understand profitability. Instead, profitability can be inferred by correlating
stock spam message volume with changes in the trading volume and price for the associated stocks.
100 commuNicaTioNS of The acm | sEPTEMBER 2009 | voL. 52 | No. 9
3. The SToRm Bo TNe T
The measurements in this paper are carried out using the
Storm botnet and its spamming agents. Storm is a peer-to-peer botnet that propagates via spam (usually by directing
recipients to download an executable from a Web site).
storm Hierarchy: There are three primary classes of
machines that the Storm botnet uses when sending spam.
Worker bots make requests for work and, upon receiving
orders, send spam as requested. Proxy bots act as conduits
between workers and master servers. Finally, the master
servers provide commands to the workers and receive their
status reports. In our experience there are a very small number of master servers (typically hosted at so-called “
bullet-proof” hosting centers) and these are likely managed by the
botmaster directly.
However, the distinction between worker and proxy is one
that is determined automatically. When Storm first infects a
host it tests if it can be reached externally. If so, then it is
eligible to become a proxy. If not, then it becomes a worker.
All of the bots we ran as part of our experiment existed as
proxy bots, being used by the botmaster to ferry commands
between master servers and the worker bots responsible for
the actual transmission of spam messages.
4. me ThoDoLoGY
Our measurement approach is based on botnet infiltration—
that is, insinuating ourselves into a botnet’s “command
and control” (C&C) network, passively observing the spam-related commands and data it distributes and, where
appropriate, actively changing individual elements of
these messages in transit. Storm’s architecture lends itself
particularly well to infiltration since the proxy bots, by
design, interpose on the communications between individual worker bots and the master servers who direct them.
Moreover, since Storm compromises hosts indiscriminately (normally using malware distributed via social engineering Web sites) it is straightforward to create a proxy bot
on demand by infecting a globally reachable host under our
control with the Storm malware.
Figure 1 also illustrates our basic measurement infrastructure. At the core, we instantiate eight unmodified Storm
proxy bots within a controlled virtual machine environment.
The network traffic for these bots is then routed through a
centralized gateway, providing a means for blocking unanticipated behaviors (e.g., participation in DDoS attacks)
and an interposition point for parsing C&C messages and
“rewriting” them as they pass from proxies to workers. Most
critically, by carefully rewriting the spam template and dictionary entries sent by master servers, we arrange for worker
bots to replace the intended site links in their spam with
URLs of our choosing. From this basic capability we synthesize experiments to measure the click-through and conversion rates for several large spam campaigns.
C&C protocol rewriting: Our runtime C&C protocol rewriter
consists of two components. A custom router redirects
potential C&C traffic to a fixed IP address and port, where a
user-space proxy server accepts incoming connections and
impersonates the proxy bots. This server in turn forwards
connections back into the router, which redirects the traffic