Vviewpoints
DOI: 10.1145/2001269.2001282
Article development led by
queue.acm.org
Kode Vicious
File-System Litter
Cleaning up your storage space quickly and efficiently.
Dear kV,
We recently ran out of storage space on
a very large file server—one with many
terabytes of space—and upon closer
inspection we found that it was just
one employee who had used it all up.
The space was taken up almost exclusively by small files that were the result
of running some data-analysis scripts.
These files were completely unnecessary after they had been read once.
The code that generated the files had
no good way of cleaning them up once
they had been created; it just went on
believing that storage was infinite. Now
we’ve had to put quotas on our file servers and, of course, deal with weekly
cries for more disk space. Surely there
is a better way of dealing with this
problem than clamping down on everyone for fear that one of them will do the
wrong thing.
caught Between a Block
and a Lack of space
Dear caught,
photograph by geNcho petKov / shutterstocK.coM
Yes, there are better ways of handling
this problem. You have now discovered
one of the drawbacks of cheap storage
(and yes, that old adage is true): files
will always expand to fill the available
storage space, just as programs expand
to fill all available memory and spawn
more threads until all of your CPU is
utilized as well.
Shared storage, such as you are
dealing with, presents the thorniest
problem because it is shared, and, it
would seem—as regular readers of
this column are, I’m sure, aware—
people simply cannot be trusted to police
themselves. In reality most people can,
but it takes just one, as you found out,
to “ruin it for everybody,” as our teachers used to say.
The point you make about the
scripts not having a way of cleaning up
after themselves is a good one. When
you build programs out of many small
source files your tools also generate
intermediate files—the objects that
then get linked into a final executable.
All build systems worthy of the name,
however, have some form of “clean”
target. Although this target was origi-
nally created so that you could start
a new build from scratch, it is also a
handy way of shrinking down the size
of your work area when a project is ei-
ther complete or on hold. Having a pro-
gram that would do the same work with
intermediate data files is a good start,
but there are other things that can be
done to improve the situation.