CTO Roundtable

S TOR AGE

CREEGER What can people who have to manage storage for a living take from this conversation? What recommendations can we make? What technologies do you see on the horizon that would help them?

KLEIMAN Storage administrators today have tremendous problems that are not adequately solved by any tools. They have home directories, databases, LUNs (logical unit numbers). It’s not just one set of bits on one set of drives; they’re all over the place. They’ve got replicas and perhaps have to manage mirroring relationships between them. They have to manage a disaster-recovery scenario and the server infrastructure on the other site if the whole thing fails. They have all these mechanisms for all these data sets that they must process day in and day out, and they have to monitor the whole thing to see if it’s working correctly. Just being able to manage that mess—the thousands of data sets they have to deal with—is a big problem that isn’t solved yet.

CREEGER Is nobody in the business of providing enterprise-level storage infrastructure management? KLEIMAN Those who have solved it best in the past have been the backup people. They actually give you a data-transfer mechanism that manages everything in the background, and they give you a GUI that allows you to say, “I want to look for this particular data set, I want to see how many copies of it I have, and I want to restore that particular thing”; or “I want to know that these many copies have been made across this much time.”

Of course, the problem is that it’s all getting blown up. So now, it’s not just, “What copies do I have on tape?” It’s “What copies do I have in various locations spread around the world? What mirroring relationships do I have?” The trouble is that today it’s all managed in someone’s head. I call it “death by mirroring.” It’s hard. We’ll sort it all out eventually. McKUSICK What do you see as a possible solution? KLEIMAN People are building outrageous ad hoc system scripts—Perl scripts and other types. My company is working on this as are lots of other people in the storage industry, but it’s more than a single-box problem. It’s managing across boxes, even managing heterogeneously.

We have to understand that we’re solving the convergence of QoS (quality of service), replication, disaster recovery, archive, and backup. What we need is a unified UI for handling all these functions, each of which used to be handled for different reasons by different mechanisms. BREWER That is a core issue. How many copies do you have and why do you have them? Every copy is serving some purpose, whether as a backup, or a replication for read throughput, or a cache copy in Flash. Because they are automatically distributed, you can’t keep track of all these things. I think you actually can manage the file system—broadly speaking, storage system—whereby you proactively assign how many copies you have of something.

SELTZER Users make copies outside the scope of the storage administrator all the time.

RIEDEL Because the amount of data and what it’s used for both increase constantly, you have to get the machines to help the users tag content with metadata—to help them know what the data is, what the copy is for, where it came from, why they have it, and what it represents. SELTZER With the data provenance, you can identify copies, whether they were made intentionally or unintentionally. That’s a start. Answering the other semantic questions, however, such as “Why was the copy made?”, will still require user intervention, which historically has been very difficult to get.

KLEIMAN Each set of data—a database, a user’s home directory—has certain properties associated with it. With a database you want to make sure it has a certain quality of service, a disaster-recovery strategy, and a certain number of archival copies so that they can go back a number of years. They may also want to have a certain number of backup checkpoints to go back to in case of corruption.

Those are all properties of the data set that can be predefined. Once set, the system can do the right thing, including making as many copies as is relevant. It’s not that people are making copies for the sake of making copies; they’re trying to accomplish this higher-level goal and not telling the system what that goal is.

36 November/December 2008 ACM QUEUE

rants: feedback@acmqueue.com

References:

mailto:feedback@acmqueue.com

Archives