FEBRUARY 2019 | VOL. 62 | NO. 2 | COMMUNICATIONS OF THE ACM 99
to the number of simultaneous sprints as each sprinter contributes to the load above rated current. Higher currents
increase the probability of tripping the breaker.
Let nS denote the number of sprinters and let Ptrip denote
the probability of tripping the breaker. The breaker occupies
one of the following regions:
• Non-Tripped. Ptrip is zero when nS < Nmin
• Non-Deterministic. Ptrip is a non-decreasing function of
nS when Nmin ≤ nS < Nmax
• Tripped. Ptrip is one when nS ≥ Nmax
Note that Nmin and Nmax depend on the breaker’s trip curve and
the application’s demand for power when sprinting. For
Spark on chip multiprocessors, we find that the breaker does
and heat sink must absorb surplus heat during a sprint.
14, 15
Second, the datacenter rack must employ batteries to guard
against power emergencies caused by a surplus of sprinters
on a shared power supply. Third, the system must imple-
ment management policies that determine which chips
sprint.
2. 1. System architecture
Chip multiprocessors and thermal packages. The quality
of the multiprocessor’s thermal package, measured by its
thermal capacitance and conductance, determines the
chip’s maximum power level and dictates the duration of a
sprint.
13, 15 More expensive heat sinks employ PCMs, which
increase thermal capacitance, and permit sprint durations
on the order of minutes if not hours. We estimate a chip
with paraffin wax can sprint with durations on the order of
150s.
After a sprint, the thermal package must release its heat
before the chip can sprint again. The average cooling duration, denoted as ∆tcool, is the time required before the PCM
returns to ambient temperature. The rate at which the PCM
dissipates heat depends on its melting point and the thermal resistance between the material and the ambient. Both
factors can be engineered and, with paraffin wax, we estimate a cooling duration on the order of 300s, twice the
sprint’s duration.
Power delivery and circuit breakers. Datacenter architects deploy servers and multiprocessors to oversubscribe
power distribution units for efficiency. Oversubscription
utilizes a larger fraction of the facility’s provisioned power.
But it relies on power capping and varied computational
load across servers to avoid tripping circuit breakers or violating contracts with utility providers.
4 Although sprints
can boost computation, the risk of a power emergency
increases with the number of sprinters in a power capped
datacenter.
Figure 2 presents the circuit breaker’s trip curve, which
specifies how sprint duration and power combine to determine whether the breaker trips. The trip time corresponds
to the sprint’s duration. Longer sprints increase the probability of tripping the breaker. The current draw corresponds
N
o
r
m
a
li
ze
d
sp
e
ed
u
p
0
1
2
3
4
5
6
Naive Decision GradientSVMLinear KmeansALS Correlation PagerankCC Triangle
N
o
r
m
a
l
i
zed
po
we
r
0.0
0.5
1.0
1. 5
Naive Decision GradientSVMLinear KmeansALS Correlation PagerankCC Triangle
Aver
a
ge
te
m
p
er
a
tu
r
e
(
°
C)
0
10
20
30
40
50
Non−sprinting Sprinting
Naive Decision GradientSVMLinear KmeansALS Correlation PagerankCC Triangle
Figure 1. Normalized speedup, power, and temperature for varied Spark benchmarks when sprinting. Nominal operation supplies three cores
at 1.2GHz. Sprint supplies twelve cores at 2.7GHz.
3600
120
2
0.1
1
23
5
10
20
Long-delay
Conventional
tripping
Short circuit
P =0 trip
P = 1
trip
Tripped
Non-deterministic
Not tripped
∆tsprint
Toleranceband
Tr
i
p
t
i
m
e
(
se
c
)
Current normalized to rated current
Figure 2. Typical trip curve of a circuit breaker.
5