construct O(n) models that predict its fitness relative to all other
devices. As such, the computational resources for maintaining
the models can be made to scale with the number of devices.
Also, in large-scale environments, certain collections of devices
will be identical and can share models.
The motivation and advantages of relative fitness modeling
can be stated as four hypotheses:
hypothesis 1. Workload characteristics can change across
storage devices (Wc ≠ Wc ) and reduce the accuracy of
an absolute model.
hypothesis 2. A relative model (Equation 2) can reduce the
inaccuracies that result from changing characteristics.
hypothesis 3. Performance and resource utilization can
improve prediction accuracy (Equation 3).
hypothesis 4. Performance ratios (Equation 4) can pro-
vide better accuracy than raw performance values
To test these hypotheses, the accuracy of various CART
models can be compared: absolute models (Equation 1), relative models (Equation 2), relative models with performance
(Equation 3), and relative fitness models (Equation 4).
4. 1. Setup
Experiments are run on an IBM x345 server (dual 2.66GHz
Xeon, 1.5GB RAM, GbE, Linux 2. 6. 12) attached to three iSCSI
storage arrays. The arrays have different hardware platforms, software stacks, and are configured with different
RAID levels.a More specifically,
Vendor a is a 14-disk RAID- 50 array with 1GB of cache
(400GB 7200 RPM Hitachi Deskstar SATA)
Vendor b is a 6-disk RAID-0 array with 512MB of cache
(250GB 7200 RPM Seagate Barracuda SATA)
Vendor c is an 8-disk RAID- 10 array with 512MB of cache
(250GB 7200 RPM Seagate Barracuda SATA)
read request sizes, the write and read randomness (average
seek distance, in blocks, per I/O), and the queue depth (
average number of outstanding I/Os). The performance (Perf ) of
each sample run is the average bandwidth (MB/s), throughput (IO/s), and latency (ms). Resource utilization (util) is not
used in this evaluation, as this requires modifying storage
device software to which we did not have access. A total of
3000 samples are generated.
Over all 3000 samples, Vendor A is the fastest array with
an average bandwidth of 25 MB/s, an average throughput
of 624 IO/s and an average latency of 37ms. Vendor B is the
second fastest ( 17 MB/s, 349 IO/s, and 45ms). Vendor C is
the third ( 14 MB/s, 341 IO/s, and 84ms). Although Vendor A
is the fastest, on average, it is not necessarily the fastest for
all sample workloads in the fitness test. There are samples
where Vendors B and C do better than A (relative fitness values greater than 1) and cases where they do worse (values
lesser than 1). In short, the relative fitness of a device can vary
with the workload characteristics.
As an example of ho w devices can behave similarly, Figure 3
illustrates how the sequential write bandwidth for each array
varies for different request sizes and queue depths. From the
3000 samples, we show only the sequential write workloads.
There are 120 such samples, sorted by the performance of
Vendor A. The graph illustrates the similar performance of
the arrays. In particular, the prominent discontinuity in the
graph is shared by all arrays (a drop in performance when
there are only one or two outstanding requests). Also note
how Vendor B is faster than Vendor C to the left of the discontinuity, but slower to the right. Such piecewise functions are
ideally suited for CART models.
In support of Hypothesis 1, Table 3 contains averages for
the workload characteristics of each sample. Note the vari-anceacrossdevices(Wc ≠ Wc), mostnotablytheaveragespatial
randomness of writes, which varies by as much as 38%. In particular, Vendor A experiences the most randomness (an average
seek distance of 321MB per write), Vendor B the second most
figure 3: Device similarity. the performance of each array changes
similarly, indicating that the performance of one array is a good
predictor of another.
The server attaches to each array using an iSCSI device
driver6 that contains counters (below the file system and page
cache) for characterizing workloads and measuring their
performance. A synthetic workload generator6 is used to gener-
ate numerous workload samples, which we refer to as a fitness
test. These samples are used to train and test the CART models.
Similar results from other workloads (e.g., Postmark, TPC-C),
as well as details on the CART algorithm (e.g., tree construction
and pruning), can be found in our conference paper.
Bandwidth in MB/s
4. 2. fitness test results
The fitness test compares the performance of the storage
arrays across a wide range of workloads (various runs of the
workload generator). The measured workload characteristics
(Wc) of each sample include the write percent, the write and
RAID level 0 is striping, 1 is mirroring, 5 is striping with parity, 10 is striping over mirrored pairs ( 4 in this case), and 50 is striping over RAID- 5 parity
arrays ( 2 in this case).
94 communicAtionS of the Acm | aPril 2009 | Vol. 52 | no. 4