then a network
API is inherently
can be lost
on its way to
the server and
covery attempts, and on and on. Code
that looks sane on first glance ends
up creating zero VMs, or three VMs,
or more. With multiple simultaneous
clients, you must deal with timing,
locking problems, crossed messages,
and a nest of heisenbugs.
Putting this logic in the client library ensures the client will need more
frequent updating. Requiring the user
to implement the recovery logic is delightfully evil: how would they even
know what they should implement?
These problems are reduced or eliminated when the API is idempotent.
Why not simply use a more reliable
network? Oh, that’s just adorable. Networks are never reliable. They can’t be.
Thinking that networks are reliable is
the first fallacy of distributed computing ( https://en.wikipedia.org/wiki/Fal-
If networks are unreliable, then a
network API is inherently unreliable,
too. A request can be lost on its way to
the server and never executed. Execution may be complete, but the reply
back to us gets lost. The server may
reboot during the operation. The client might reboot while sending the
request, while waiting for the request,
or after receiving the request but before local state is stored on stable
storage. So many edge cases!
In distributed computing everything can fail. If you hate your customers, you can make sure that dealing with failure is burdensome, error
prone, and just plain impossible to
get 100% right. Customers will always
be fixing edge cases instead of doing
Don’t spoil the fun. Show your disdain for customers with non-idempotent APIs.
Summary. The accompanying table
includes a summary of these techniques along with ways that companies
may accidentally provide good service
to their API customers.
Getting buy-in. Your coworkers may
resist some of these techniques. How
do you get them on board?
You could have them read this article, although that could backfire. If the
wrong person reads it, he or she might
push back and do the opposite.
If that happens, you might end up
Shirley, You Can’t Be Serious!
with a great API that is easy to get start-
ed with, easy to use, has great docu-
mentation that is easy to access, and
helps people write code that works the
first time and every time.
This article is written in jest to make
a point. Although some companies
do the bad things set forth here, they
don’t do them to hurt customers. In
my experience, engineers take pride
in doing good work and impressing
customers with well-made systems.
I trust that when companies do the
naughty things in this article, it is out
of ignorance, lack of resources, or an
Luckily, in some cases the good
practice is easier to implement than
the bad practice. Creating an authentication system to restrict access to documentation is more difficult than making the documentation freely available.
Putting all documentation on one long
page so that it can be searched using
Ctrl-F is easier than putting each API
call on a separate page.
Sadly, some of these good practices
do require a lot of work. Creating a self-service onboarding system is not easy.
It requires usability testing and revisions. Ease of use is never achieved on
the first guess.
Justifying the resources required
for all these good practices may be
difficult, especially when an API isn’t
used by many of your customers.
“What’s the ROI when hardly anyone
uses our API?” your management may
ask. I look at it differently: Maybe usage is low because you haven’t done
Programmers are People, Too
Managing Technical Debt
Thomas A. Limoncelli is the SRE manager at Stack
Overflow Inc. in New York City. His books include The
Practice of System and Network Administration, The
Practice of Cloud System Administration, and Time
Management for System Administrators. He blogs at
EverythingSysadmin.com and tweets at @Yes That Tom.
Copyright held by author/owner.
Publication rights licensed to ACM.