Latency and Livelocks
kode vicious
Sometimes data just doesn’t travel as fast as it should.
Sometimes a program appears to be running fine,
but is quietly failing behind the scenes. If you’ve
experienced these problems, you may have struggled
for a while and then become baffled and/or tired. Kode
Vicious knows your frustration, and this month serves up
some instructive words on how to deal with both of these
annoying problems. Fatigued or mystified by other quan-daries? E-mail your problem to KV@acmqueue.com.
Dear KV,
My company has a very large database with all of our
customer information. The database is replicated to several locations around the world to improve performance
locally, so that when customers in Asia want to look at
their data, they don’t have to wait for it to come from the
United States, where my company is based.
A few months ago the company upgraded its software,
which required all of the records in the customer database
to be updated as well. When we tested the upgrade program it took only a few minutes to update a large number
of records, but when we had to update the Asian customers, a process that had to run in Asia and that touched
data in the U.S., the process began to take a lot longer.
Since the company has a very fast network connecting
the U.S. and the Asian offices, it’s hard to understand how
the distance could matter. There must be another reason
for the time it is taking to run these programs.
Baffled with Bandwidth
Dear Baffled,
There is a big difference between having a big pipe and
knowing how to use it. Two things matter in networking: bandwidth and latency. Unfortunately, most people
think only of the former, not the latter. Latency is the
G otaquestionforKodeVicious?E-mailhimat
k v@acmqueue.com—if you dare! And if your letter
a ppears in print, he may even send you a Queue coffee
m ug, if he”s in the mood. And oh yeah, we edit letters for
c ontent, style, and for your own good!
A koder with
attitude, KVANSWERS
YOUR QUESTIONS.
MISS MANNERS HE AIN”T.
time a message takes to get
from point A to point B
(e.g., a client and a server).
If I were to guess, I would
figure that you tested the
conversion program on a local network, probably 100-
Mbit Ethernet, where latencies are generally less than 1
ms. You then ran the program remotely across your very
fast network and found that it ran much more slowly.
I bet it ran 100 times as slow. How did I pick 100 times?
Easy, the average round-trip time across the Pacific
is about 100 ms. What you forgot was a very important
constant, and then you forgot how networks work.
The very fast network you speak of was probably sold
on a bit-per-second basis, so maybe you have a 1-Gbps
link between Japan and the U.S., but that’s a measure of
bandwidth, not latency. It’s the latency that matters in
your case. Why? Because your conversion process very
likely takes one record at a time, packs it up, makes a
request to the server, and then waits for a response.
When the underlying network has very low latency,
like a local network, this packing up of a single request
and waiting for a response is barely noticed; but if
you move to a higher-latency network, it crushes your
system’s performance under its iron heel.
The thing you forgot is c, the speed of light. Let’s say
you were running the conversion process in Tokyo and
it was talking to a database in California. It’s about 5,000
miles from Tokyo to California, and the speed of light
is 186,000 miles per second, so a beam of light should
be able to make it from Tokyo to California in about
0.027 seconds. That’s 2. 7 ms for the absolute fastest time
between those two points, a round trip of 5. 4 ms. That’s
already a factor of five slower than your LAN.
Of course, packets don’t travel point to point. They are
stored and forwarded at various points along their journey—that’s how the Internet works. Each waypoint (the
technical term is router) introduces its own bit of delay to
the packet’s journey, until what we have is an average of
50 ms each way between Japan and California. Now you
have a difference of a factor of 100 between your LAN
and the real network. Reality bites, and in this case it bit