Another nuance is that these two
values together describe the translation between a CPU’s TSC and the
system’s gethrtime(), and they are
estimates. That means two important
things: They need to be updated regularly to correct for error in the calibration and estimation; and they need
to be set and read atomically. This is
where the cmpxchg16b instruction
enters.
Additionally, this calibration work
is performed every five seconds in a
separate thread, and we attempt to
make that thread high priority on a
real-time scheduler. It turns out that
this all works quite well, even without the ability to change priority or
scheduling class.
Gravy
Since we’re clearly having to correct
for skew to align with the system
gethrtime(), and the point in the
past to which gethrtime() is relative is arbitrary (according to the
documentation), we’ve elected to
make that “arbitrary” point the Unix
epoch. No additional instructions
are required, and now the replacement gethrtime() can be used to
power gettimeofday(). Therefore,
y = mx + b is actually implemented as:
nano _ second _ since _ epoch =
nanos _ per _ tick mtev _
rdtscp() + skew
Obviously, we’ll pick up changes to
the wall clock (via adjtime() et al.)
only when we recalibrate.
Safety
Obviously, things can and do go wrong.
A variety of fail-safe mechanisms are in
place to ensure proper behavior when
the optimizations become unsafe. By
default, if the lack of an invariant TSC
is detected, the system is disabled. If
a calibration loop fails for too long
( 15 seconds), the CPU is marked as bad
and disabled. During rudimentary performance tests, if the system’s gethrtime() can beat the emulation, then
we disable. If all those tests pass, we
still check to see if the underlying system can perform gettimeofday()
faster than we can emulate it; if so, we
disable gettimeofday() emulation.
The goal is for mtev_gethrtime()
to be as fast as or faster than gethrtime() and for mtev_gettimeof-day() to be as fast as or faster than
gettimeofday().
Results
The overall results are better than ex-
pected. The original goal was simply to
provide a way for our implementation
on Illumos to meet the performance
of Linux. The value of ZFS is deeply
profound, and while Linux has some
performance advantages in specific
arenas, that doesn’t matter much if
you have undetectable corruption of
the data you store.
Further optimization is possible
in the implementation, but we’ve
stopped for now, having achieved
the initial goals. Additionally, for
the purposes of this test, we have
built the code portably. We can find a
couple of nanoseconds if we compile
-march=native on machines supporting the AVX (Advanced Vector Extensions) instruction set.
It is true that an approximately 40ns
gethrtime() can be considered slow
enough, relative to microsecond-level
efforts, that very prudent selection is
still necessary. It is also true that 40ns
gethrtime() can open up a new
world of possibilities for user-space instrumentation. It has certainly opened
our eyes to some startling things.
Acknowledgment
This all comes for free with https://
github.com/circonus-labs/libmtev/
blob/master/src/utils/ mtev_time.com
(see https://github.com/circonus-labs/
libmtev/blob/master/src/utils/mtev_
time.c for reference).
As of this writing, Linux and Illumos are supported platforms, and
Darwin and FreeBSD do not have
“faster time” support. The faster time
support in libmtev was a collaborative effort between Riley Berton and
Theo Schlossnagle.
Related articles
on queue.acm.org
Passively Measuring TCP Round-Trip Times
Stephen D. Strowes
http://queue.acm.org/detail.cfm?id=2539132
The One-Second War
(What Time Will You Die?)
Poul-Henning Kamp
http://queue.acm.org/detail.cfm?id=1967009
Principles of Robust Timing
over the Internet
Julien Ridoux and Darryl Veitch
http://queue.acm.org/detail.cfm?id=1773943
Theo Schlossnagle is the founder and chief executive
officer at Circonus, where he works on large-scale
numerical data analysis. He is the author of Scalable
Internet Architectures (Sams Publishing, 2006) and
founder of Omni TI, an Internet consultancy.
Copyright held by owner/author.
Publication rights licensed to ACM. $15.00.