Let’s talk about PtRaN. two papers
came out in 1988: your “overview of
the PtRaN analysis System” and “IBm
Parallel foRtRaN”. It’s important to
distinguish these two projects. IBm
Parallel foRtRaN was a product, a
foRtRaN augmented with constructs
such as PaRaLLeL LooP and PaRaLLeL CaSe and oRIGINate taSK. So
the foRtRaN product is foRtRaN
with extra statements of various kinds,
whereas with PtRaN, you were working with raw foRtRaN and doing the
analysis to get parallelism.
Right.
what was the relationship between the
two projects? the IBm Parallel foR-
tRaN paper cites your group as having
provided some discussion.
The PTRAN group was formed in the
early 1980s, to look first at automatic
vectorization. IBM was very late in getting into parallelism. The machines
had concurrency, but getting into explicit parallelization, the first step was
vectorization of programs. I was asked
to form a compiler group to do parallel work, and I knew of David Kuck’s
work, which started in the late 1960s at
the University of Illinois around the IL-LIAC project. I visited Kuck and hired
some of his students. Kuck and I had a
very good arrangement over the years.
He set up his own company—KAI.
Kuck and associates, Inc.
Right. IBM, at one point later on,
had them subcontracted to do some
of the parallelism. They were very open
about their techniques, with one exception, and they were the leaders early
on. They had a system called Parafrase,
which enabled students to try various
kinds of parallelizing code with FORTRAN input and then hooked to a timing simulator back-end. So they could
get real results of how effective a particular set of transformations would be.
It was marvelous for learning how to
do parallelism, what worked and what
didn’t work, and a whole set of great
students came out of that program.
In setting up my group, I mostly hired
from Illinois and NYU. The NYU people were involved with the Ultracomputer, and we had a variant of it here,
a project called RP3, Research Parallel
Processor Prototype, which was an in-stantiation of their Ultracomputer.
another thing
i think was a very
big step was not
only identifying
parallelism, but
identifying useful
parallelism.
the Ultracomputer was perhaps the
first to champion fetch-and-add as a
synchronization primitive.
Yes. A little history: The Ultracomputer had 256 processors, with
shared distributed memory, accessible through an elaborate switching
system. Getting data from memory is
costly, so they had a combining switch,
one of the big inventions that the NYU
people had developed. The fetch-and-add primitive could be done in the
switch itself.
Doing fetch-and-add in the switch
helped avoid the hot-spot problem of
having many processors go for a single
shared counter. Very clever idea.
Very, very clever. So IBM and NYU
together were partners, and supported
by DARPA to build a smaller machine.
The number of processors got cut back
to 64 and the combining switch was no
longer needed, and the project kind of
dragged on. But my group supplied the
compiler for that. The project eventually got canceled.
So that was the background, in IBM
Research and at the Courant Institute.
But then the main server line, the 370s,
3090s, were going to have vector processors.
multiple vector processors as well as
multiple scalar processors.
Yes. And the one that we initially
worked on was a six-way vector proces-
sor. We launched a parallel translation
group, PTRAN. Jean Ferrante played a
key role. Michael Burke was involved;
NYU guy. Ron Cytron was the Illinois
guy. Wilson Hsieh was a co-op student.
Vivek Sarkar was from Stanford, Dave
Shields and Philippe Charles from
NYU. All of these people have gone on
to have some really wonderful careers.
Mark Wegman and Kenny Zadeck were
not in the PTRAN group but were do-
ing related work. We focused on taking
dusty decks and producing good paral-
lel code for the machines—continuing
the theme of language-independent,
machine-independent, and do it auto-
matically.
“Dusty decks” refers to old programs
punched on decks of hollerith cards.
Nowadays we’ve got students who have
never seen a punched card.
We also went a long way with working with product groups. There was
a marvelous and very insightful programmer, Randy Scarborough, who
worked in our Palo Alto lab at the time.
He was able to take the existing FORTRAN compiler and add a little bit or
a piece into the optimizer that could
do pretty much everything that we
could do. It didn’t have the future that
we were hoping to achieve in terms of
building a base for extending the work
and applying it to other situations, but
it certainly solved the immediate problem very inexpensively and well at the
time. That really helped IBM quickly
move into the marketplace with a very
parallel system that was familiar to the
customers and solved the problem.
Disappointing for us, but it was the
right thing to have happen.
Did PtRaN survive the introduction of
this product?
Yes, it survived. The product just
did automatic vectorization. What we
were looking at was more parallelism
in general.
one particular thing in PtRaN was
looking at the data distribution problem, because, as you remarked in your
paper, the very data layouts that improve sequential execution can actually harm parallel execution, because you
get cache conflicts and things like that.
Yes.
that doesn’t seem to be addressed at
all by the “IBm Parallel foRtRaN” paper. what kinds of analysis were you
doing in PtRaN? what issues were you
studying?