In 2018, you joined another consortium
of industry and academia, Toronto’s
Vector Institute, as president and CEO.
If I compare it to the Parallel Data
Lab, what I say to people at CMU is, it’s
kind of like the Parallel Data Lab, except it’s moved outside the university,
and it’s dealing with 10 times as much
people, and 10 times the amount of
money. And a lot more partners.
How have machine learning, deep
learning, and the other areas the Vector
Institute is looking at impacted systems
Around 2010, Carlos Guestrin at
Carnegie Mellon and Joseph Heller-stein at Berkeley began to think about
solving for machine learning problems. And that meant distributive
processing. What happens when we
try to do an iterative convergent solution of an equation using distributed
systems? And how are we going to deal
with all of that communication? And
that became the big way that machine
learning impacted systems design. We
are going to do so much communication that the algorithms are going to
need to explicitly take into consideration the cost of communication.
You’ve also done some work in that
area with Eric Xing and Greg Ganger.
In 2012, we started working on stale
synchronous processing, or SSP, as opposed to bulk synchronous processing,
which is the basis for parallel computing
that Leslie Valiant did in his Turing Award
work. The idea in stale synchronous is
that when you are searching for an approximate solution—which is what a convergent algorithm is, because you’re going to assume that it’s close enough—you
can allow error as long as you can bound
it. In particular, you can allow some signals to arrive later than others. In the traditional computing world, we would have
called this relaxed consistency.
My students since then have said that,
while it’s true that staleness can be tolerated, it isn’t necessarily fast. So, the work
has been more along the lines of, how can
we allow staleness if it increases the speed
of the system, while maximizing freshness in order that we converge fast?
Leah Hoffmann is a technology writer based in Piermont,
© 2019 ACM 0001-0782/19/12 $15.00
your career. Can you tell me about
the Parallel Data Lab, which you
founded in 1993, two years after you
left Berkeley and moved to Carnegie
At Berkeley, I loved the benefits
of interacting with industry—
engaging with smart people and real-world problems and trends. I wanted
to build the same infrastructure at
Carnegie Mellon. We created what we
called the Parallel Data Consortium,
which was a vehicle to give companies access and interaction with the
people in the lab. Initially, we held
annual retreats in which the research
results of the lab were shared and discussed with industry collaborators.
That structure has evolved over time,
as have the industry participants. But
the number of companies involved in
it grew from an initial six to about 20
now, plus strong engagements with
government funding agencies.
One of the more impactful projects to
emerge from the lab was Network Attached Secure Disks (NASD) technology, which moved magnetic storage
disks out of the host computers and
communicated with them via networking protocols in the interests of providing more scalable storage.
We did a few things that ended
up having influence on the architecture. The first was to call up the
interface abstraction from the magnetic disk layer upward in the stack,
but not all the way to the file system
abstraction. That’s still the way it’s
done today. Almost all of the large
file systems have an interface layer,
whether it’s an AWS object or a file
system object. It’s usually seperate
from the file system above it, and it
is the scalability component.
The security aspect is also interesting, because we separated policy
from implementation. The storage
object would implement security and
access control using a Merkle tree
protection, a lot like Blockchain is
used in the integrity of ledgers. That
gave us the ability to move storage
across the network in a way where the
storage components didn’t have to
understand the file systems’ policies.
It was a big influence on the distributed systems community.
[CONTINUED FROM P. 96]
For further information
or to submit your
Research and Practice
(DGOV) is an
journal on the
potential and impact
innovations and its
public institutions. It
promotes applied and
and data sciences
Research and Practice