popular websites. The two crawls, conducted in May 2016, successfully generated causality trees for the homepages of
71,217 domains in Alexa and 62,086 domains in .COM. Failures resulted from
timeouts and unresolvable domains,
which were expected especially for .COM
since the zone file contains domains that
may not have an active website.
How Websites Use Libraries …
Overall, our study used static and dy-
namic signatures for 72 open source
libraries. We found at least one library
on the homepage of 87% of the Alexa
sites and 65% of the .COM sites. Figure
4 shows the 12 most common libraries
in Alexa. jQuery is by far the most pop-
ular, used by 84% of the Alexa sites and
61% of the .COM sites. In other words,
nearly every website that is using a li-
brary is using jQuery. SWFObject, a
library used to include Adobe Flash
content, is ranked seventh (4%) and
10th (2%), despite being discontinued
since 2013. On the other hand, several
relatively well-known libraries such as
D3, Dojo, and Leaflet appear below the
top 30 in both crawls, possibly because
tached to in the DOM: gray corresponds
to resources attached to the main
document, while one of four colors is
assigned to each further document in
frames. Document squares contain
the color of their parent location in the
DOM, and their own assigned color.
Resources created by a script in one
frame can be attached to a document
in another frame, as shown by the gray
script that has a blue child in Figure 2
(that is, the blue script is a child of the
blue document in the DOM).
Figure 3a shows a LinkedIn widget as included in the causality tree
of mercantil.com. (An interactive
version is available online at https://
seclab.ccs.neu.edu/static/projects/
javascript-libraries/.) Note the Web
developer embedded code provided
by the social network into the main
document, which in turn initializes
the widget and creates several scripts
in multiple frames.
Web Crawl
Causality trees are generated using
an instrumented version of the Chromi-
um Web browser. Its Chrome Dev-
Tools Protocol (https://chromedev-
tools.github.io/devtools-protocol/)
allows detection of most resource-in-
clusion relationships; for some corner
cases, we had to resort to source code
modifications in the browser. We also
link library detections to nodes in the
causality tree and run a modified ver-
sion of AdBlock Plus to label (but not
block) advertisement, tracking, and
social media nodes in the causality
trees. While visiting a page, the crawl-
er scrolls downward to trigger loading
of any dynamic content. As page-load-
ed events proved to be unreliable, our
crawler remains on each page for a
fixed delay of 60 seconds before clear-
ing its entire state, restarting, and
then proceeding to the next site.
To gain a representative view of JavaScript library usage on the Web, we
collected two different datasets. First,
we crawled Alexa’s top 75,000 domains,
which represent popular websites. Second, we crawled 75,000 domains randomly sampled from a snapshot of the
.com zone—that is, a random sample of
all websites with a .com address, which
was expected to be dominated by less
Figure 3. Causality tree of Mercantile.com.
,yepnope
uery
jquery
jquery
jquery-ui
jquery modernizr,yepnope
jquery
jquery
jquery
jquery-ui
SWFObject
(a) (b)