E
N
G
R
A
V
I
G
S
B
Y
J
O
H
N
T
E
N
N
I
E
L
F
R
O
M
S
H
U
T
T
E
R
S
T
O
C
K
.
C
O
M
economics to “monitor all the things.” In
that context ...
Let’s assume we want to achieve an
average latency for operations of one
microsecond. Now let’s wrap some num-
bers around that. I will make some notes
about certain aspects of hardware, but I’ll
really focus only on hardware from the
past several years. While we like to think
in terms of seconds, computers don’t
care about this concept of time. They care
only about clock ticks.
The TSC
Online CPUs are forever marching forward at some frequency, and the period
of this frequency is called a tick. In an
effort to save power, many computers can shift between different power-saving states that cause the frequency
of the CPU to change. This could make
it excruciatingly difficult to tell high-granularity time accurately, if the frequency of the CPU were used for timing. Each core on a modern CPU has a
TSC (time-stamp counter) that counts
the number of ticks that have transpired. You can read this counter with
the cleverly named rdtsc assembly
instruction. Also, modern CPUs have a
feature called an invariant TSC, which
guarantees that the frequency at which
ticks occur will not change for any reason (but mainly for power-saving mode
changes). My development box has an
invariant TSC that ticks approximately
2.5999503 times per nanosecond. Other
machines have different frequencies.
The standard tooling to figure out how
long an operation takes on a Unix machine
is either clock _ gettime(CLOCK _
MONOTONIC,...) or gethrtime(). These
calls return the number of nanoseconds
since some arbitrary fixed point in the past.
The examples shown here use gethrtime() because it is shorter to write.
hrtime _ t start = gethrtime();
some _ operation _ worthy _ of _
measurement();
hrtime _ t elapsed = gethrtime() -
start;
As these things are measured, the
gethrtime() call itself will take some
time. The question is: where does the
time it returns sit relative to the begin-
ning and end of the gethrtime()
call itself? That can be answered with
benchmarks. The bias introduced by
measuring the start and finish is relative
to its contribution to overall running
time. In other words, if the “operation”
being measured is made to take a long
time over many iterations, the measure-
ment bias can be reduced to near zero.
Timing gethrtime() with gethr-
time() would look like this:
#define LOTS 10000000
hrtime _ t start = gethrtime();
for(int i=0;i<LOTS;i++) (
void)ge-thrtime();
hrtime _ t elapsed = gethrtime() -
start;
double avg _ ns _ per _ op = (
double) elapsed / (double)LOTS;
Behold, a benchmark is born. Furthermore, you could actually measure
the number of ticks elapsed in the