four approaches
to Content Delivery

Given these bottlenecks and scalability challenges, how does one achieve the levels of performance and reliability required for effective delivery of content and applications over the Internet? There are four main approaches to distributing content servers in a content-delivery architecture: centralized hosting, “big data center” CDNs ( content-delivery networks), highly distributed CDNs, and peer-to-peer networks.

Centralized Hosting. Traditionally architected Web sites use one or a small number of collocation sites to host content. Commercial-scale sites generally have at least two geographically dispersed mirror locations to provide additional performance (by being closer to different groups of end users), reliability (by providing redundancy), and scalability (through greater capacity).

This approach is a good start, and for small sites catering to a localized audience it may be enough. The performance and reliability fall short of expectations for commercial-grade sites and applications, however, as the end-user experience is at the mercy of the unreliable Internet and its middle-mile bottlenecks.

There are other challenges as well: site mirroring is complex and costly, as is managing capacity. Traffic levels fluctuate tremendously, so the need to provision for peak traffic levels means that expensive infrastructure will sit underutilized most of the time. In addition, accurately predicting traffic demand is extremely difficult, and a centralized hosting model does not provide the flexibility to handle unexpected surges.

“Big Data Center” CDNs. Content-delivery networks offer improved scalability by offloading the delivery of cacheable content from the origin server onto a larger, shared network. One common CDN approach can be described as “big data center” architecture—caching and delivering customer content from perhaps a couple dozen high-capacity data centers connected to major backbones.

Although this approach offers some performance benefit and economies of scale over centralized hosting, the potential improvements are limited because the CDN’s servers are still far

away from most users and still deliver content from the wrong side of the middle-mile bottlenecks.

It may seem counterintuitive that having a presence in a couple dozen major backbones isn’t enough to achieve commercial-grade performance. In fact, even the largest of those networks controls very little end-user access traffic. For example, the top 30 networks combined deliver only 50% of end-user traffic, and it drops off quickly from there, with a very long tail distribution over the Internet’s 13,000 networks. Even with connectivity to all the biggest backbones, data must travel through the morass of the middle mile to reach most of the Internet’s 1. 4 billion users.

A quick back-of-the-envelope calculation shows that this type of architecture hits a wall in terms of scalability as we move toward a video world. Consider a generous forward projection on such an architecture—say, 50 high-capacity data centers, each with 30 outbound connections, 10Gbps each. This gives an upper bound of 15Tbps total capacity for this type of network, far short of the 100Tbps needed to support video in the near term.

Highly Distributed CDNs. Another approach to content delivery is to leverage a very highly distributed network—one with servers in thousands of networks, rather than dozens. On the surface, this architecture may appear quite similar to the “big data center” CDN. In reality, however, it is a fundamentally different approach to content-server placement, with a difference of two orders of magnitude in the degree of distribution.

By putting servers within end-user ISPs, for example, a highly distributed CDN delivers content from the right

side of the middle-mile bottlenecks, eliminating peering, connectivity, routing, and distance problems, and reducing the number of Internet components depended on for success.

Moreover, this architecture scales. It can achieve a capacity of 100Tbps, for example, with deployments of 20 servers, each capable of delivering 1Gbps in 5,000 edge locations.

On the other hand, deploying a highly distributed CDN is costly and time consuming, and comes with its own set of challenges. Fundamentally, the network must be designed to scale efficiently from a deployment and management perspective. This necessitates development of a number of technologies, including:

˲ Sophisticated global-scheduling, mapping, and load-balancing algorithms

˲ Distributed control protocols and reliable automated monitoring and alerting systems

˲ Intelligent and automated failover and recovery methods

˲Colossal-scale data aggregation and distribution technologies ( designed to handle different trade-offs between timeliness and accuracy or completeness)

˲Robust global software-deploy-ment mechanisms

˲ Distributed content freshness, integrity, and management systems

˲ Sophisticated cache-management protocols to ensure high cache-hit ratios

These are nontrivial challenges, and we present some of our approaches later on in this article.

Peer-to-Peer Networks. Because a highly distributed architecture is critical to achieving scalability and perfor-

figure 2: effect of distance on throughput and download times.

Distance from server network to user Latency

local: 1.6ms < 100 mi.

Regional: 16ms 0.7% 500– 1,000 mi.

Cross-continent: 48ms 1.0% ~ 3,000 mi.

Multi-continent: 96ms 1.4% ~ 6,000 mi.

typical Packet Loss

0.6%

throughput (quality)

44Mbs (HDTv)

4Mbs (not quite DvD)

1Mbs (not quite Tv)

0.4Mbs
(poor)

4GB DVD Download time

12 min.

2. 2 hrs.

8. 2 hrs.

20 hrs.

feBRuaRY 2009 | vol. 52 | No. 2 | CommunICatIons of the aCm

47

References:

Archives