cation’s history with parallelism over
the past 15 years.
coLe: You’ve been writing software for
a long time, and for the past 11 years
you’ve been working with Photoshop
and have become increasingly engaged
with its parallel aspects. Which parts of
that have proved to be easy and what has
turned out to be surprisingly hard?
WiLLiamS: The easy part is that Photoshop has actually had quite a bit of
parallelism for a long time. At a very simplistic level, it had some worker threads
to handle stuff like asynchronous cursor
tracking while also managing asynchronous I/O on another thread. Making
that sort of thing work has been pretty
straightforward because the problem
is so simple. There’s little data shared
across that boundary, and the goal is not
to get compute scaling; it’s just to get an
asynchronous task going.
I should note, however, that even
with that incredibly simple task of queuing disk I/O requests so they could be
handled asynchronously by another
thread, the single longest-lived bug I
know of in Photoshop ended up being
nestled in that code. It hid out in there
for about 10 years. We would turn on
the asynchronous I/O and end up hitting that bug. We would search for it for
weeks, but then just have to give up and
ship the app without the asynchronous
I/O being turned on. Every couple of
versions we would turn it back on so we
could set off looking for the bug again.
coLe: I think it was Butler Lampson
who said the wonderful thing about serial machines is you can halt them and
look at everything. When we’re working
in parallel, there’s always something
else going on, so the concept of stopping everything to examine it is really
hard. I’m actually not shocked your bug
was able to hide in the I/O system for
that long.
WiLLiamS: It turned out to be a very
simple problem. Like so many other
aspects of Photoshop, it had to do with
the fact that the app was written first for
the Macintosh and then moved over to
Windows. On the Macintosh, the set file
position call is atomic—a single call—
whereas on Windows, it’s a pair of calls.
The person who put that in there didn’t
think about the fact that the pair of calls
has to be made atomic whenever you’re
sharing the file position across threads.
coLe: Now, of course, you can look
back and say that’s obvious.
WiLLiamS: In fact, the person who
originally put that bug in the code was
walking down the hallway one of the
many times we set off looking for that
thing, smacked his forehead, and realized what the problem was— 10 years
after the fact.
Anyway, the other big area in Photoshop where we’ve had success with
parallelism involves the basic image-processing routines. Whenever you run
a filter or an adjustment inside Photoshop, it’s broken down into a number
of basic image-processing operations,
and those are implemented in a library
that’s accessed through a jump table.
Early on, that allowed us to ship accelerated versions of these “bottleneck routines,” as they’re called. In the 1990s,
when companies were selling dedicated
DSP (digital signal processor) cards for
accelerating Photoshop, we could patch
those bottlenecks, execute our routine
on the accelerator card, and then return
control to the 68KB processor.
That gave us an excellent opportunity to put parallelism into the app in a
way that didn’t complicate the implementations for our bottleneck-routine
algorithms. When one of those routines
was called, it would be passed a pointer—or two or three pointers—to bytes
in memory. It couldn’t access Photoshop’s software-based virtual memory
and it couldn’t call the operating system; it was just a math routine down at
the bottom. That gave us a very simple
way—prior to getting down to the math
routine—of inserting something that
would slice up the piece of memory we
wanted to process across multiple CPUs
and then hand separate chunks of that
off to threads on each CPU.
coLe: The key there is you had an
object that could be broken apart into
smaller objects without the upper-level
piece needing to worry about it. It also
helps that you had a nice, clean place to
make that split.
WiLLiamS: The other nice aspect is that
the thing on the bottom didn’t need to
know about synchronization. It was still
nothing more than a math routine that
was being passed a source pointer—or
maybe a couple of source pointers and
counts—along with a destination point-
er. All the synchronization lived in that
multiprocessor plug-in that inserted
itself into the jump table for the bottle-
neck routines. That architecture was put
into Photoshop in about 1994. It allowed
us to take advantage of Windows NT’s
symmetric multiprocessing architec-
ture for either two or four CPUs, which
was what you could get at the time on a
very high-end machine. It also allowed
us to take advantage of the DayStar mul-
tiprocessing API on the Macintosh. You
could buy multiprocessor machines
from DayStar Digital in the mid- to late-
1990s that the rest of the Mac operating
system had no way of taking advantage
of—but Photoshop could.