FEBRUARY2019 | VOL. 62 | NO. 2 | COMMUNICATIONS OF THE ACM 103
modes and we estimate speedups by comparing the two
traces, epoch by epoch. In a practical system, online profiling and heuristics would be required to estimate
Datacenter simulation. We simulate 1000 users and evaluate their performance in the sprinting game. The simulator uses server traces and models system dynamics as agents
sprint, cool, and recover. Simulations evaluate homogeneous agents who arrive randomly and launch the same type
of Spark application; randomized arrivals cause application
phases to overlap in diverse ways. Diverse phase behavior
exercises the sprinting game as agents optimize strategies
in response to varied competitors’.
Table 1 summarizes technology and system parameters.
Parameters Nmin and Nmax are set by the circuit breaker’s tripping curve. Parameters pc and pr are set by the chip’s cooling
mechanism and the system’s batteries. These probabilities
decrease as cooling efficiency and recharge speed increase.
We evaluate the sprinting game and its equilibrium threshold against several alternatives that represent broader perspectives on power management. First, greedy heuristics
focus on the present and neglect the future.
21 Second, control-theoretic heuristics are reactive rather than proactive.
Third, centralized heuristics focus on the system and neglect
individual users. Unlike these approaches, the sprinting
game anticipates the future and models strategic agents in a
Greedy (G) permits agents to sprint as long as the chip is
not cooling and the rack is not recovering. This mechanism
may frequently trip the breaker and require rack recovery.
Greedy produces a poor equilibrium—knowing that everyone is sprinting, an agent’s best response is to sprint as well.
Exponential Backoff (E-B) throttles the frequency at which
agents sprint. An agent sprints greedily until the breaker
trips. After the t-th trip, agents wait for some number of
epochs drawn randomly from [0, 2t − 1] before sprinting
again. The waiting interval contracts by half if the breaker
has not been tripped in the past 100 epochs.
Cooperative Threshold (C-T) assigns each agent the globally
optimal sprinting threshold. The coordinator identifies and
enforces thresholds that maximize system performance.
Although these thresholds provide an upper bound on performance, they do not produce an equilibrium because thresholds do not reflect agents’ best responses to system dynamics.
Equilibrium Threshold (E-T) assigns each agent her optimal threshold from the sprinting game. The coordinator
collects performance profiles and finds thresholds that
The analysis runs periodically to update sprinting strategies and the tripping probability as application mix and system conditions evolve. The analysis does not affect an
application’s critical path as agents use updated strategies
when they become available but need not wait for them. On
an Intel® Core™ i5 processor with 4GB of memory, the analysis completes in less than 10s, on average.
Online play. An agent decides whether to sprint at the
start of each epoch by estimating a sprint’s utility and comparing it against her threshold. Estimation could be implemented in several ways. An agent could use the first few
seconds of an epoch to profile her normal and sprinting performance. Alternatively, an agent could use heuristics to
estimate utility from additional cores and higher clock rates.
For example, task queue occupancy and cache misses are
associated with a sprint’s impact on task parallelism and
instruction throughput, respectively. Comparisons with a
threshold are trivial.
5. EXPERIMENTAL METHODOLOGY
Server measurements. The agent and its application are
pinned to a chip multiprocessor, an Intel® Xeon® E5-2697 v2.
In normal mode, the agent uses three 1.2GHz cores. In
sprinting mode, the agent uses twelve 2.7GHz cores. We turn
cores on and off with Linux sysfs. In principle, sprinting
represents any mechanism that performs better but consumes more power.
We evaluate Apache Spark workloads. The Spark run-time engine dynamically schedules tasks to use available
cores and maximize parallelism, adapting as sprints cause
the number of available cores to vary across epochs. We profile workloads by modifying Spark (v1.3.1) to log the IDs of
jobs, stages, and tasks as they complete. We profile system
and power temperature using the Intel® Performance
Counter Monitor 2. 8.
We measure workload performance in terms of tasks
completed per second (TPS). The total number of tasks in
a job is constant and independent of the available hardware resources such that TPS measures performance for a
fixed amount of work. In our experiments, we trace TPS
during application execution in normal and sprinting
Table 1. Experimental Parameters.
Description Symbol Value
Min sprinters Nmin 250
Max sprinters Nmax 750
Prob. of staying in cooling pc 0.50
Prob. of staying in recovery pr 0.88
Discount factor δ 0.99
Algorithm 1: Optimizing the Sprint Strategy
input : Density for sprinting utilities ( f (u) )
output: Optimal sprinting threshold (u T)
j ← 1
P0 lstrip ← 1
while Pj trip not converged do