figure 1. An example data center and warehouse-scale computer.
Each cluster is homogeneous in
both the processor type and speed. The
thousands of hosts are orchestrated to
exploit thread-level parallelism central
to many Internet workloads as they divide incoming requests into parallel
subtasks and weave together results
from many subtasks across thousands
of cores. In general, in order for the request to complete, all parallel subtasks
must complete. As a result, the maximum response time of any one subtask
will dictate the overall response time.
Even in the presence of abundant
thread-level parallelism, the communication overhead imposed by the network and protocol stack can ultimately
limit application performance as the
effects of Amdahl’s Law2 creep in.
The high-level system architecture
and programming model shape both
the programmer’s conceptual view
and application usage. The latency and
bandwidth “cost” of local (DRAM) and
remote (network) memory references
are often baked into the application
as programming trade-offs are made
to optimize code for the underlying
system architecture. In this way, an ap-
plication organically grows within the
confines of the system architecture.
Web applications such as search,
email, and document collaboration
are scheduled resources and run within a cluster.
4, 8 User-facing applications
have soft real-time latency guarantees or SLAs that the application must
meet. In this model, an application has
approximately tens of milliseconds
to reply to the user’s request, which is
subdivided and dispatched to worker
threads within the cluster. The worker
threads generate replies that are aggregated and returned to the user. Unfortunately, if a portion of the workflow
does not execute in a timely manner,
then it may exceed a specified timeout
delay—as a result of network congestion, for example—and consequently
some portion of the coalesced results
will be unavailable and thus ignored.
This needlessly wastes both CPU cycles
and network bandwidth, and may adversely impact the computed result.
To reduce the likelihood of congestion, the network can be overprovi-