Ewm,i (pm,i) =
α + β1 F Plm,i− 1 + β2 E(pm,i− 1) + um + εm,i.
An observation (Ewm,i (pm,i)) is the
median popularity (pm,i) of the posts
containing protomeme m on day i, or
the score of the post. Since post
scores are count data and distributed
over a skewed distribution, we used
a Poisson mixed model. Each observation has weight wm,i, or the number of posts containing m on day i,
thus giving less weight to the (
presumably noisy) information on protomemes m that were not used very
much on day i.
FPlm,i − 1 registers if protomeme m
was on the front page on day i − 1. The
parameter l is the number of posts hitting the front page each day. The default front page in Reddit includes 25
posts ( 30 for Hacker News). So, every
day, at least 25 ( 30) posts hit the front
page. However, users can increase the
physical length of the front page. Moreover, as the day passes, front-page posts
get replaced by other posts. As a result,
there is no way to know how many posts
hit any particular user’s front page on
any given day (see the online appendix).
For this reason, we ran the regression
multiple times for different l values to
test the robustness of our results for
different front-page sizes.
E(pm,i− 1) denotes the protomeme’s
popularity on the day before i, controlling for existing trends; um is a random
effect of protomeme we used to control
for the fact that different observations
can refer to the same protomeme; and
εm,i is the error term.
Figure 2 outlines our estimates of β1
for different values of l. The effect of
being on the front page is negative; the
protomeme is expected to have a lower
median score than on a business-as-
usual day. This expectation confirms
the theory of dissimilarity-driven suc-
ty; scores cluster around each pro-
tomeme’s average. Third, a
protomeme’s recent history might
explain this variation. If, for exam-
ple, a protomeme was getting de-
clining scores, we might expect it to
get even lower scores after a ran-
dom popularity spike. Finally, there
is a fat tail in score distributions, so
the average alone is not meaningful.
To give more solid evidence for
the theory about viral connections,
we tested the median popularity of a
protomeme on a particular day using the following mixed model, or
“MED Model”
Figure 1. Distributions of score and number of posts per day of the observed memes, overall, and focusing on only those created the day
after the meme was among the 25 top-scoring posts; included is the average and 95% confidence intervals.
44
45
46
47
48
49
50
51
52
Overall After FP
A
v
g. D
ai
ly
Sco
r
e
Status
4. 3
4. 4
4. 5
4. 6
4. 7
4. 8
4. 9
5
5. 1
5. 2
Overall After FP
P
os
ts
Pe
rD
ay
Status
7
8
9
10
11
12
13
14
Overall After FP
A
v
g. D
ai
ly
Sc
o
re
Status
1.06
1.08
1. 1
1. 12
1. 14
1. 16
1. 18
1. 2
1. 22
Overall After FP
P
o
st
sPe
r
Da
y
Status
(a) (b)
(c) (d)
Figure 2. Evolution of β1 coefficients for increasing l in the model and in its associated null
model; thin lines represent 95% confidence intervals.
-0.12
-0.1
-0.08
-0.06
-0.04
-0.02
0
0.02
20 30 40 50 60 70 80 90100
β
l
Observation Null Model
-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
30 40 50 60 70 80 90 100
β
l
Observation Null Model
(a) (b)