important details may appear washed
out. Latency deviating from the norm
is particularly important to examine,
especially occurrences of high latency.
Since these may represent only a small
fraction of the workload—perhaps less
than 1%—the color shade may be very
light and difficult to see. A false color
palette can be applied instead to highlight these subtle details, given the
trade-off that the color shades then
cannot be used to gauge relative I/O
counts between pixels.
A particular advantage of heat-map
visualization is the ability to see outliers. For the latency heat map these
may be occasional I/O operations with
particularly high latency, which can
cause significant performance issues.
If the y-axis scale is automatically
picked to display all data, outliers are
easily identified as occasional pixels at
the top of the heat map. This also presents a problem: a single I/O with high
latency will rescale the y-axis, compressing the bulk of the data. When
desired, outliers can be eliminated so
that the bulk of the I/O can be examined in detail. An automatic approach
can be to drop a percentage (say, 0.1%)
of the highest-latency I/O from the display, when desired.
To generate latency heat maps, data
is collected for each I/O event: the completion time and I/O latency. This data
is then grouped into the time/latency
pixels for the heat map, and the pixels
are shaded based on their I/O counts.
If the original I/O event data is preserved, heat maps can be regenerated
for any time and latency range, and of
different resolutions. A problem with
this is the size of the data: busy production systems may be serving hundreds
of thousands of I/O events per second.
Collecting this continually for long
intervals, such as days or weeks, may
become prohibitive—both for the storage required and the time to process
and generate the heat maps. One solution is to summarize this data to a sufficiently high time and latency resolution and to save the summarized data
instead. When displaying heat maps,
these summaries are resampled to the
resolution desired.
heat map explained
Latency heat maps were implemented
as part of an Oracle system-observabil-ity tool called Analytics. The implementation allows them to be viewed in
real time and continually records data
with a one-second granularity for later
viewing. This is made possible and optimal by DTrace, which has the ability
figure 1. nfs latency when enabling ssD-based cache devices.
figure 2. synchronous writes to a striped pool of disks.
to trace and summarize data in-kernel
to a sufficient resolution and to return
these summaries every second to user-land. The user-land software then resamples the summarized data to produce
the heat maps.
The heat map in Figure 1, an example screenshot from Analytics, shows
the latency distribution of an NFS read
workload and the effect on NFS latency when using an additional layer of
flash-memory-based cache. This cache
layer was enabled at 19:31: 38, which
has been centered on the x-axis in this
screenshot. Explaining this heat map
in detail will show how effective this
visualization is for understanding the
role of these system components and
their effect on delivered NFS latency.
In this screenshot, a panel is displayed to the left of the heat map to
show average IOPS counts. Above and
below the panel the “Range average:”
and “8494 ops per second” show the average NFS I/O per second for the visible
time range (x-axis). Within the panel
are averages for latency ranges, the first
showing an average of 2,006 NFS IOPS
between 0 and 333 µs. Each of these
latency ranges corresponds to a row of
pixels on the heat map.
For the time before 19:31: 38, the
system served NFS reads from one of
two locations: a DRAM-based cache or
disk storage. If the requested I/O was
not in the DRAM cache, then it was retrieved from disk instead. In the heat
map, two levels of latency can be seen.
These correspond to:
˲ ˲ DRAM hits, shown as a dark line at
the bottom of the heat map
˲ ˲ Disk hits, shown as a shaded cloud
of latency from 2ms and higher
This is as expected. DRAM hits have
very low latency and are shown in the
lowest-latency pixel. This pixel repre-
sents latencies between 0 and 333 µs,
which is the resolution limit of the cur-
rently displayed heat map. Since the
recorded data has a higher resolution,
this heat map can be redrawn with dif-
ferent vertical scales to reveal finer de-
tails. By zooming to the lower latencies
the DRAM hits were found to be mostly
in the range of 0 to 21 µs. 2
The latency for disk hits has a wide
distribution, from about 2ms to the top
of the displayed heat map at 10 ms. The
returned latency for disk I/O includes
rotation, seek, and bus I/O transfer