that was eye-catching, however. We
named it the rainbow pterodactyl.
The disk I/O bytes graph (the rainbow) shows three features: the initial
rise, followed by a decreased slope,
and then decay. By corresponding the
disk graph with the heat map, characteristics in the heat map can be seen to
occur at certain disk counts. The heat
map shows the following features.
The “beak” occurs from disk one to
disk eight. The reason for two levels
of latency is not fully understood, but
an experiment has provided a clue: if
the same data is read repeatedly to ensure disk-cache hits, then only one line
is seen with low latency. The two-line
pattern happens when these disks are
read sequentially, suggesting that the
second line is for disk-cache misses.
Analyzing this further is difficult with
standard tools: input to the disk and
its returned latency can be traced, but
there is no visibility into disk internals
such as the operation of the disk data
controller.
When the ninth disk is added, the
beak turns into the “head.” The disks
are attached using two SAS cables, each
x4 ports, providing eight SAS ports in
total. Accessing the ninth disk may be
causing contention on those ports in
the SAS controller and the corresponding random latency pattern. When the
disks are attached using a single x4 SAS
cable, the beak-to-head transition occurs at the fifth disk.
A “bulge” forms at the top of the
head between disks 9 and 12, showing
slightly increased latency. The reason
for this is not certain, though it may be
from increasing contention for the SAS
ports. Nor is the reason known for the
reduced latency that forms the “neck”
at disks 13 and 14.
Approximately between disks 15
and 20 is the “wing.” This sudden
increase in latency causes the knee
point in the disk-throughput graph.
The source for this contention is not
known, although another disk-scaling
experiment using a single x4 SAS cable
to a single JBOD produced a wingless
pterodactyl.
From about disk 20 onward, while
disks continue to be added, latency
continues to rise and becomes less
consistent. This is expected to be PCIe-
gen1 bus contention on the SAS controller card.
All of these features are made visible
by the heat map, yet are completely unknown by the individual I/O events that
form the input: they provide only completion times and I/O latency, while the
disk count is increased. The heat map
has imaged the I/O subsystem from
this data, showing components that
are suspected to be disk caches, SAS
ports, and the PCIe bus.
To summarize the rainbow pterodactyl: little is known with accuracy,
and much more investigation is needed. What this does show is how deep a
simple visualization can become.
Latency Levels
For the rainbow pterodactyl, I/O bus
throughput was tested by stepping a se-
quential disk-read workload. This was
repeated on a different system with a
more powerful I/O subsystem, and it
was found that sequential disk reads
from all available disks could not reach
I/O bus saturation (no knee point). To
see if a limit could be found, the work-
load was changed to read the same
128KB from each disk repeatedly, so
that each could provide more through-
put only by returning from its cache.
The result is shown in Figure 8.
figure 7. sequential disk reads, stepping disk count.