determine whether the improvement was needed, how much improvement was needed, and how they
would know if the optimization was
achieved? Did they determine how
much time and money were worth
expending on the optimization? Optimizations that require an infinite
budget are not nearly as useful as one
I would look to see if they benchmarked the system before and after,
not just one or the other or not at all.
I would like to see that they identified
a specific problem, rather than just
randomly tuning parts until they got
better results. I would like to see that
they determined the theoretical optimum as a yardstick against which all
results were measured.
I would pay careful attention to
the size of the improvement. Was
the improvement measured, or did
it simply “feel faster?” Did the candidates enhance performance greatly
or just squeeze a few additional percentage points out of the existing
system? I would be impressed if they
researched academic papers to find
I would be most impressed, however, if they looked at the bigger picture
and found a way to avoid doing the calculation entirely. In operations, often
the best improvements come not from
adding complexity, but by eliminating
You’re Doing It Wrong
You Don’t Know Jack about Network
Kevin Fall and Steve McCanne
Why Writing Your Own
Search Engine is Hard
Thomas A. Limoncelli is a site reliability engineer
at Stack Overflow Inc. in New York City. His books
include The Practice of Cloud Administration (http://the-
cloud-book.com), The Practice of System and Network
Administration ( http://the-sysadmin-book.com), and Time
Management for System Administrators. He blogs at
EverythingSysadmin.com and tweets at @Yes That Tom.
Copyright held by author.
Publication rights licensed to ACM. $15.00
operation was very slow. His solution
was to demand a faster machine. The
sysadmin who investigated the issue
found that the code was downloading
millions of data points from a database on another continent. The network between the two hosts was very
slow. A faster computer would not improve performance.
The solution, however, was not to
build a faster network, either. Instead,
we moved the calculation to be closer
to the data. Rather than download the
data and do the calculation, the sysadmin recommended changing the SQL
query to perform the calculation at the
database server. Instead of downloading millions of data points, now we
were downloading the single answer.
This solution seems obvious but
eluded the otherwise smart developer.
How did that happen? Originally, the
data was downloaded because it was
processed and manipulated many
different ways for many different purposes. Over time, however, these other
purposes were eliminated until only
one purpose remained. In this case
the issue was not calculating the max
value, but simply counting the number of data points, which SQL is very
good at doing for you.
9. Seek Inspiration from
The Downstream Processes
Another solution is to look at what is
done with the data later in the process.
Does some other processing step sort
the data? If so, the max value doesn’t
need to be calculated. You can simply
sort the data earlier in the process and
take the last value.
You wouldn’t know this was possible unless you took the time to talk
with people and understand the end-to-end flow of the system.
Once I was on a project where data
flowed through five different stages,
controlled by five different teams.
Each stage took the original data and
sorted it. The data didn’t change between stages, but each team made a
private copy of the entire dataset so
they could sort it. Because they had
not looked outside their silos, they
didn’t realize how much wasted effort
By sorting the data earlier in the
flow, the entire process became much
faster. One sort is faster than five.
10. Question the Question
When preparing this column I walked
around the New York office of Stack
Overflow and asked my coworkers if
they had ever been in a situation where
calculating the max value was a bottleneck worth optimizing.
The answer I got was a resounding no.
One developer pointed out that cal-
culating the max is usually something
done infrequently, often once per pro-
gram run. Optimization effort should
be spent on tasks done many times.
A developer with a statistics back-
ground stated that the max is use-
less. For most datasets it is an outlier
and should be ignored. What are use-
ful to him are the top N items, which
presents an entirely different algo-
Another developer pointed out that
anyone dealing with large amounts of
data usually stores it in a database, and
databases can find the max value very
efficiently. In fact, he asserted, main-
taining such data in a homegrown
system is a waste of effort at best and
negligent at worst. Thinking you can
maintain a large dataset safely with
homegrown databases is hubris.
Most database systems can determine the max value very quickly because of the indexes they maintain.
If the system cannot, it isn’t the
system administrator’s responsibility to rewrite the database software,
but to understand the situation well
enough to facilitate a discussion
among the developers, vendors, and
whoever else is required to find a
Conclusion: Find Another Question
This brings me to my final point. Maybe the interview question posed at the
beginning of this column should be
retired. It might be a good logic problem for a beginning programmer, but
it is not a good question to use when
interviewing system administrators because it is not a realistic situation.
A better question would be to ask
job candidates to describe a situation
where they optimized an algorithm.
You can then listen to their story for
signs of operational brilliance.
I would like to know that the candidates determined ahead of time what
would be considered good enough.
Did they talk with stakeholders to