and indexing system. The second wave
came when our quality team and re-
search groups started using GFS rather
aggressively—and basically, they were
all looking to use GFS to store large data
sets. And then, before long, we had 50
users, all of whom required a little sup-
port from time to time so they’d all keep
playing nicely with each other.”
One thing that helped tremendously
was that Google built not only the file
system but also all of the applications
running on top of it. While adjustments
were continually made in GFS to make
it more accommodating to all the new
use cases, the applications themselves
were also developed with the various
strengths and weaknesses of GFS in
mind. “Because we built everything, we
were free to cheat whenever we wanted
to,” Gobioff neatly summarized. “We
could push problems back and forth
between the application space and the
file-system space, and then work out ac-
commodations between the two.”
The matter of sheer scale, however,
called for some more substantial ad-
justments. One coping strategy had
to do with the use of multiple “cells”
across the network, functioning essen-
tially as related but distinct file systems.
Besides helping to deal with the im-
mediate problem of scale, this proved
to be a more efficient arrangement for
the operations of widely dispersed data
centers.
Rapid growth also put pressure on
another key parameter of the original
GFS design: the choice to establish
64MB as the standard chunk size. That,
of course, was much larger than the
typical file-system block size, but only
because the files generated by Google’s
crawling and indexing system were unusually large. As the application mix
changed over time, however, ways had
to be found to let the system deal efficiently with large numbers of files
requiring far less than 64MB (think in
terms of Gmail, for example). The problem was not so much with the number
of files itself, but rather with the memory demands all of those files made on
the centralized master, thus exposing
one of the bottleneck risks inherent in
the original GFS design.
mCKuSICK: I gather from the original GFS
paper [in Proceedings of the 2003 ACM
Symposium on Operating Systems Princi-
ples] that file counts have been a signifi-
cant issue for you right along. Can you
go into that a little bit?