disk, and—rarely nowadays!—disk to
off-line storage.
On typical server hardware today,
completely random memory access
on a range much larger than cache
size can be an order of magnitude or
more slower than purely sequential
access, but completely random disk
access can be five orders of magnitude slower than sequential access
(see Figure 3). Even state-of-the-art
solid-state (flash) disks, although they
have much lower seek latency than
magnetic disks, can differ in speed
by roughly four orders of magnitude
between random and sequential access patterns. The results for the test
shown in Figure 3 are the number of
four-byte integer values read per second from a 1-billion-long (4GB) array
on disk or in memory; random disk
reads are for 10,000 indices chosen at
random between one and one billion.
A further point that’s widely un-derappreciated: in modern systems,
as demonstrated in the figure, random access to memory is typically
slower than sequential access to disk.
Note that random reads from disk are
more than 150,000 times slower than
sequential access; SSD improves on
this ratio by less than one order of
magnitude. In a very real sense, all of
the modern forms of storage improve
only in degree, not in their essential
nature, upon that most venerable and
sequential of storage media: the tape.
The huge cost of random access
has major implications for analysis of
large datasets (whereas it is typically
mitigated by various kinds of caching
when data sizes are small). Consider,
for example, joining large tables that
are not both stored and sorted by the
join key—say, a series of Web transactions and a list of user/account
information. The transaction table
has been stored in time order, both
because that is the way the data was
gathered and because the analysis of
interest (tracking navigation paths,
say) is inherently temporal. The user
table, of course, has no temporal dimension.
As records from the transaction table are consumed in temporal order,
accesses to the joined user table will
be effectively random—at great cost if
the table is large and stored on disk. If
sufficient memory is available to hold
the user table, performance will be
improved by keeping it there. Because
random access in RAM is itself expensive, and RAM is a scarce resource
that may simply not be available for
caching large tables, the best solution
when constructing a large database
for analytical purposes (for example,
in a data warehouse) may, surprisingly, be to build a fully denormalized
table—that is, a table including each
transaction along with all user infor-
mation that is relevant to the analysis
(as shown in Figure 4).
Denormalizing a 10-million-row,
10-column user information table
onto a 1-billion-row, four-column
transaction table adds substantially
to the size of data that must be stored
(the denormalized table is more than
three times the size of the original
tables combined). If data analysis is
carried out in timestamp order but requires information from both tables,
figure 3. comparing random and sequential access in disk and memory.
316 values/sec random, disk
53.2M values/sec sequential, disk
1924 values/sec random, ssD
42.2M values/sec sequential, ssD
36.7M values/sec random, memory
sequential, memory
10
358.2M values/sec
100 1000 104 105 106 107 108
Disk tests were carried out on a freshly booted machine (a Windows 2003 server with 64GB RaM and
eight 15,000RPM SaS disks in RaID5 configuration) to eliminate the effect of operating-system disk caching.
SSD test used a latest generation Intel high-performance Sa Ta SSD.
figure 4. Denormalizing a user information table.
transid timestamp page userid
userid age sex
1
2
3
4
5
6
7
8
country
…
9999997
6
9999994
23535
6
6
9999994
3
4
10 columns
10 million rows
1 billion rows
9999993
9999994
9999995
9999996
9999997
9999998
9999999
100000000
transid timestamp page userid agesex country
…
13 columns