The order of magnitude of difference obviously originates with the
number of levels of heap inside each
VM page, so the ultimate speedup will
be on machines with small pointer
sizes and big page sizes. This is a
pertinent observation, as operating
system kernels start to use super-pages to keep up with increased I/O
throughput.
so Why are You, and i,
still Doing it Wrong?
An (in)famous debate, “Quicksort vs.
Heapsort,” centered on the fact that
the worst-case behavior of the former is terrible, whereas the latter has
worse average performance but no
such “bad spots.” Depending on your
application, that can be a very important difference.
We lack a similar inquiry into algorithm selection in the face of the
anisotropic memory access delay
caused by virtual memory, CPU caches, write buffers, and other facts of
modern hardware.
Whatever book you learned programming from, it probably had a
figure within the first five pages diagramming a computer much like the
one shown in Figure 7. That is where
it all went pear shaped: that model is
totally bogus today.
Amazingly, it is the only conceptual model used in computer education, despite the fact that it has next to
nothing to do with the execution environment on a modern computer. And
just for the record: by modern, I mean
VAX 11/780 or later.
The past 30 or 40 years of hardware
and operating-systems development
seems to have only marginally impinged on the agenda in CS departments’ algorithmic analysis sections,
and as far as my anecdotal evidence, it
has totally failed to register in the education they provide.
The speed disparity between primary and secondary storage on the
Atlas Computer was on the order of
1: 1,000. The Atlas drum took two milliseconds to deliver a sector; instructions took approximately two microseconds to execute. You lost around
1,000 instructions for each VM page
fault.
On a modern multi-issue CPU,
running at some gigahertz clock fre-
most cs-educated
professionals
still ignore Vm as
a matter of
routine. this is an
embarrassment
for cs as
a discipline and
profession, not
to mention wasting
enormous amounts
of hardware and
electricity.
quency, the worst-case loss is almost
10 million instructions per VM page
fault. If you are running with a rotat-
ing disk, the number is more like 100
million instructions.f
What good is an O(log2(n)) al-
gorithm if those operations cause
page faults and slow disk operations?
For most relevant datasets an O(n)
or even an O(n2) algorithm, which
avoids page faults, will run circles
around it.
Performance analysis of algorithms
will always be a cornerstone achievement of computer science, and like
all of you, I really cherish the foldout
chart with the tape sorts in Volume 3
of The Art of Computer Programming.
But the results coming out of the CS
department would be so much more
interesting and useful if they applied
to real computers and not just toys
like ZX81, C64, and TRS- 80.
f And below the waterline there are the flushing
of pipelines, now useless and in the way, cache
content, page-table updates, lookaside buffer
invalidations, page-table loads, etc. It is not
atypical to find instructions in the “for operating system programmers” section of the CPU
data book, which take hundreds or even thousands of clock cycles, before everything is said
and done.
Related articles
on queue.acm.org
Thread Scheduling in FreeBSD 5. 2
Marshall Kirk McKusick and
George V. Neville-Neil
http://queue.acm.org/detail.cfm?id=1035622
Flash Storage Today
Adam Leventhal
http://queue.acm.org/detail.cfm?id=1413262
high Performance Web Sites
Steve Souders
http://queue.acm.org/detail.cfm?id=1466450f
References
1. Fotheringham, j. Dynamic storage allocation in
the Atlas Computer, including an automatic use of
a backing store. Commun. ACM 4, 19 (Apr. 1961),
435–436.
2. Williams, j. W. j. Algorithm 232—heapsort. Commun.
ACM 7, 6 (june 1964), 347–348.
Poul-henning Kamp ( phk@FreebsD.org) has
programmed computers for 26 years and is the inspiration
behind bikeshed.org. his software has been widely
adopted as “under the hood” building blocks in both open
source and commercial products. his most recent project
is the Varnish h TTP accelerator, which is used to speed up
large Web sites such as Facebook.