handle things, because they are pretty
smart at managing data and keeping
the fresh data near at hand.
When you build a cache, pick a
key that is easy to search for. Here is
a hint: strings are not easy to search
for, but hashes of strings are. When
you add new object types to the cache,
do not do it by adding a pointer from
one object to another, unless those
two objects must be treated as one
thing. Trying to track down whether
or not a sub-object has changed by
walking all the parent objects is inefficient. When using timestamps
to indicate freshness, look at the
seconds—computers are fast; many
things happen in a computer minute.
If the cache contains files, put them
all under a common root directory, as
this will make flushing them easier
for the user.
Much of what I have discussed is
local caching, and KV has left out the
distributed case, for reasons of personal sanity. The advice here is a good
start to building a distributed cache,
because if you get this wrong in the
local case, your chances of getting it
right in the distributed case are nil.
If KV gets a good letter about the distributed case, he might take leave of
his senses long enough to try to answer it, or he might burn the letter—
and since sending letters is also a distributed system, there is no guarantee
that you will know if the message was
delivered, lost, or is in a stale cache.
What Are You Trying to Pull?
Division of Labor in Embedded Systems
Beautiful Code Exists,
If You Know Where to Look
George V. Neville-Neil ( email@example.com) is the proprietor of
Neville-Neil Consulting and co-chair of the ACM Queue
editorial board. He works on networking and operating
systems code for fun and profit, teaches courses on
various programming-related subjects, and encourages
your comments, quips, and code snips pertaining to his
Copyright held by author.
“deploy,” your system will not, in fact,
think that the data is stale. That is, if
you do deploy-change-deploy in the
same minute, which is quite possible
for a touch typist, the system will not
think that the old data is out of date.
Another possibility is that the settings
you’re changing are in a file that’s not
being watched by the system, and that
the thing that it cares about is some file
that points to your file.
A correctly implemented cache
tracks all the things being cached,
not just the thing that the programmer assumes other people will modify.
The problem usually comes as more
types of data are added to the cache.
A file here, a database entry there, and
eventually what you have is not really a
well-organized cache, but instead, the
data-storage equivalent of spaghetti
code, where files point to files, and the
file with the dragon has the pellet with
the poison, but the directory with the
dragon has the file that is true.
One way for you, the user, to deal
with this is to be a tad brutal and flush
the cache. Whereas invalidation is the
careful, even surgical, removal of stale
data, a flush is just what it sounds like:
it sends all the crap right down the
drain, fresh and stale. It sounds like
whoever implemented your system has
made this difficult for you by scattering
the requisite files all over the system,
but once you have your complete list,
it is probably time to write a flush.sh
shell script to clear out all the cruft you
can think of.
Coming back up for air, let’s think
about how a proper cache is managed,
so that the next system any of us come
across does not require the use of sys-tem-call tracing to find out why the
system is misbehaving. Caches make
sense only if they speed up access to
frequently used data; otherwise, let
the operating system or your database
Advertise with ACM!
tracks all the things
Reach the innovators
and thought leaders
working at the
Request a media kit
+ 1 212-626-0686