four approaches
to Content Delivery
Given these bottlenecks and scalability
challenges, how does one achieve the
levels of performance and reliability required for effective delivery of content
and applications over the Internet?
There are four main approaches to
distributing content servers in a content-delivery architecture: centralized
hosting, “big data center” CDNs (
content-delivery networks), highly distributed CDNs, and peer-to-peer networks.
Centralized Hosting. Traditionally architected Web sites use one or a small
number of collocation sites to host
content. Commercial-scale sites generally have at least two geographically
dispersed mirror locations to provide
additional performance (by being closer to different groups of end users), reliability (by providing redundancy), and
scalability (through greater capacity).
This approach is a good start, and
for small sites catering to a localized
audience it may be enough. The performance and reliability fall short of
expectations for commercial-grade
sites and applications, however, as the
end-user experience is at the mercy of
the unreliable Internet and its middle-mile bottlenecks.
There are other challenges as well:
site mirroring is complex and costly,
as is managing capacity. Traffic levels
fluctuate tremendously, so the need to
provision for peak traffic levels means
that expensive infrastructure will sit
underutilized most of the time. In addition, accurately predicting traffic
demand is extremely difficult, and a
centralized hosting model does not
provide the flexibility to handle unexpected surges.
“Big Data Center” CDNs.
Content-delivery networks offer improved
scalability by offloading the delivery
of cacheable content from the origin
server onto a larger, shared network.
One common CDN approach can be
described as “big data center” architecture—caching and delivering customer
content from perhaps a couple dozen
high-capacity data centers connected
to major backbones.
Although this approach offers some
performance benefit and economies
of scale over centralized hosting, the
potential improvements are limited
because the CDN’s servers are still far
away from most users and still deliver
content from the wrong side of the
middle-mile bottlenecks.
It may seem counterintuitive that
having a presence in a couple dozen major backbones isn’t enough to achieve
commercial-grade performance. In
fact, even the largest of those networks
controls very little end-user access traffic. For example, the top 30 networks
combined deliver only 50% of end-user
traffic, and it drops off quickly from
there, with a very long tail distribution
over the Internet’s 13,000 networks.
Even with connectivity to all the biggest
backbones, data must travel through
the morass of the middle mile to reach
most of the Internet’s 1. 4 billion users.
A quick back-of-the-envelope calculation shows that this type of architecture hits a wall in terms of scalability
as we move toward a video world. Consider a generous forward projection
on such an architecture—say, 50 high-capacity data centers, each with 30
outbound connections, 10Gbps each.
This gives an upper bound of 15Tbps
total capacity for this type of network,
far short of the 100Tbps needed to support video in the near term.
Highly Distributed CDNs. Another approach to content delivery is to leverage
a very highly distributed network—one
with servers in thousands of networks,
rather than dozens. On the surface, this
architecture may appear quite similar
to the “big data center” CDN. In reality,
however, it is a fundamentally different
approach to content-server placement,
with a difference of two orders of magnitude in the degree of distribution.
By putting servers within end-user
ISPs, for example, a highly distributed
CDN delivers content from the right
side of the middle-mile bottlenecks,
eliminating peering, connectivity,
routing, and distance problems, and
reducing the number of Internet components depended on for success.
Moreover, this architecture scales. It
can achieve a capacity of 100Tbps, for
example, with deployments of 20 servers, each capable of delivering 1Gbps
in 5,000 edge locations.
On the other hand, deploying a highly distributed CDN is costly and time
consuming, and comes with its own
set of challenges. Fundamentally, the
network must be designed to scale efficiently from a deployment and management perspective. This necessitates
development of a number of technologies, including:
˲ Sophisticated global-scheduling,
mapping, and load-balancing algorithms
˲ Distributed control protocols and
reliable automated monitoring and
alerting systems
˲ Intelligent and automated failover
and recovery methods
˲Colossal-scale data aggregation
and distribution technologies (
designed to handle different trade-offs
between timeliness and accuracy or
completeness)
˲Robust global software-deploy-ment mechanisms
˲ Distributed content freshness, integrity, and management systems
˲ Sophisticated cache-management
protocols to ensure high cache-hit ratios
These are nontrivial challenges, and
we present some of our approaches
later on in this article.
Peer-to-Peer Networks. Because a
highly distributed architecture is critical to achieving scalability and perfor-
figure 2: effect of distance on throughput and download times.
Distance from server network
to user Latency
local: 1.6ms
< 100 mi.
Regional: 16ms 0.7%
500– 1,000 mi.
Cross-continent: 48ms 1.0%
~ 3,000 mi.
Multi-continent: 96ms 1.4%
~ 6,000 mi.
typical
Packet Loss
0.6%
throughput
(quality)
44Mbs
(HDTv)
4Mbs
(not quite DvD)
1Mbs
(not quite Tv)
0.4Mbs
(poor)
4GB DVD
Download time
12 min.
2. 2 hrs.
8. 2 hrs.
20 hrs.
feBRuaRY 2009 | vol. 52 | No. 2 | CommunICatIons of the aCm
47