At the outset, it may seem the
workload-scaling problem is similar to the well-known Amdahl’s Law,
which deals with workload speedup
as associated with parallel-compute
resources. However, Amdahl’s Law
cannot be applied here since the “
par-allelizability factor” analogy to “scale
factor” does not hold up—the former
is independent of the resource being
scaled (that is, parallel processors),
while the latter is a function of the resource being scaled (frequency). Utilization scalability over frequency has a
cascade relation via the scale factor.
Terminology and Assumptions
This article makes use of following terminology:
• Utilization or C0 activity percentage
refers to the active (non-idle) clocked
state3 within a time window. Load and
utilization, used interchangeably here,
refer to the same attribute.
• Scalable activity refers to the part
(or percentage) of the operation whose
execution-time scales inversely with
• Stall in activity refers to the part
(or percentage) of the operation that
involves stall during its completion.
While a longer delay actually causes C-state demotion, the instruction stalls
referred to here are much shorter and
occur with the CPU in C0 active state.
Consider a frequency governor on
a DVFS system, updating frequency at
every periodic time window T. Consider
the present instant at time t0 (“NOW”),
as depicted in Figure 1. Let the just-completed window T of workload have
tact (or ta) as part of its scalable activity
(that is, tact represents the cumulative
portion of activity that keeps the processor busy in execution without the need
for any dependent delay or stalls). Let
tstall (or ts) be the sub-duration depicting
the effective stalls experienced (
inter-subsystem dependency stalls). Also let
toff (or to) be the cumulative duration
where the DVFS subsystem is in a deeper C-state, which is a significantly lower
power than C0 state. The fundamental
difference between toff and tstall is that, in
the latter case the subsystem is still active C0 while experiencing very short dependency stalls; whereas in the former
case the delays are large enough and put
the subsystem into a momentary deeper C-state.
•Aperf. A running counter that
counts at actual clock rate of execution at that instant. This actual clock
frequency may vary over time based
on governance and/or other algorithms. This register counts only during active state (C0).
• Mperf. A running counter for activity
that counts at a fixed TSC (time stamp
counter) clock rate. It counts only during active state (C0).
• Pperf. This counter is similar to Aperf,
except that it does not count when the
activity stalls as a result of some dependency, likely gated on another IP’s
clock domain (for example, memory).
The delta of these counters in a given
time window commonly interpret the
• Utilization. U = ΔMperf/TSC
• Scale factor. S = ΔPperf/ΔAperf
Ideally, if an activity is free of stalls,
then Pperf = Aperf (that is, the scale factor S will be 1). In such a situation, the
time taken to complete an activity in
a given time window would simply be
the inverse of the actual frequency in
that window. In most real workloads
the scale factor varies and is often less
than 1. Establishing accurate equations relating utilization, scale factor,
and frequency allows DVFS governors
to dispense well-informed frequency-change decisions.
Figure 1. Scalability-related definitions.
now future past
t– 1 t1 t0
Figure 2. Perf counter delta relating to
reference time window.
ta + ts
ta + ts + toff
Figure 3. A browser-based workload from