S TOR AGE
CREEGER What can people who have to manage storage
for a living take from this conversation? What recommendations can we make? What technologies do you see
on the horizon that would help them?
KLEIMAN Storage administrators today have tremendous
problems that are not adequately solved by any tools.
They have home directories, databases, LUNs (logical
unit numbers). It’s not just one set of bits on one set of
drives; they’re all over the place. They’ve got replicas and
perhaps have to manage mirroring relationships between
them. They have to manage a disaster-recovery scenario
and the server infrastructure on the other site if the whole
thing fails. They have all these mechanisms for all these
data sets that they must process day in and day out, and
they have to monitor the whole thing to see if it’s working correctly. Just being able to manage that mess—the
thousands of data sets they have to deal with—is a big
problem that isn’t solved yet.
CREEGER Is nobody in the business of providing enterprise-level storage infrastructure management?
KLEIMAN Those who have solved it best in the past have
been the backup people. They actually give you a data-transfer mechanism that manages everything in the background, and they give you a GUI that allows you to say,
“I want to look for this particular data set, I want to see
how many copies of it I have, and I want to restore that
particular thing”; or “I want to know that these many
copies have been made across this much time.”
Of course, the problem is that it’s all getting blown
up. So now, it’s not just, “What copies do I have on
tape?” It’s “What copies do I have in various locations
spread around the world? What mirroring relationships
do I have?” The trouble is that today it’s all managed in
someone’s head. I call it “death by mirroring.” It’s hard.
We’ll sort it all out eventually.
McKUSICK What do you see as a possible solution?
KLEIMAN People are building outrageous ad hoc system
scripts—Perl scripts and other types. My company is
working on this as are lots of other people in the storage
industry, but it’s more than a single-box problem. It’s
managing across boxes, even managing heterogeneously.
We have to understand that we’re solving the convergence of QoS (quality of service), replication, disaster
recovery, archive, and backup. What we need is a unified
UI for handling all these functions, each of which used to
be handled for different reasons by different mechanisms.
BREWER That is a core issue. How many copies do you
have and why do you have them? Every copy is serving
some purpose, whether as a backup, or a replication for
read throughput, or a cache copy in Flash. Because they
are automatically distributed, you can’t keep track of all
these things. I think you actually can manage the file
system—broadly speaking, storage system—whereby you
proactively assign how many copies you have of
SELTZER Users make copies outside the scope of the storage administrator all the time.
RIEDEL Because the amount of data and what it’s used for
both increase constantly, you have to get the machines to
help the users tag content with metadata—to help them
know what the data is, what the copy is for, where it
came from, why they have it, and what it represents.
SELTZER With the data provenance, you can identify
copies, whether they were made intentionally or unintentionally. That’s a start. Answering the other semantic
questions, however, such as “Why was the copy made?”,
will still require user intervention, which historically has
been very difficult to get.
KLEIMAN Each set of data—a database, a user’s home
directory—has certain properties associated with it. With
a database you want to make sure it has a certain quality
of service, a disaster-recovery strategy, and a certain number of archival copies so that they can go back a number
of years. They may also want to have a certain number of
backup checkpoints to go back to in case of corruption.
Those are all properties of the data set that can be
predefined. Once set, the system can do the right thing,
including making as many copies as is relevant. It’s not
that people are making copies for the sake of making
copies; they’re trying to accomplish this higher-level goal
and not telling the system what that goal is.
36 November/December 2008 ACM QUEUE