In this asynchronous example, wrapping the code inside
a synchronous Count Words() function—in order to abstract
away the presence of an underlying asynchronous microsecond-scale device—requires support for a wait primitive. The existing
approaches for waiting on a condition invoke operating system
support for thread scheduling, thereby incurring significant
overhead when wait times are at the microsecond scale.
Synchronous/Asynchronous
Programming Models
This evidence indicates several ma-
jor classes of opportunities ahead in
the upcoming “era of the killer micro-
second.” First, and more near-term,
it is relatively easy to squander all the
benefits from microsecond devices by
progressively adding suboptimal sup-
porting software not tuned for such
small latencies. Computer scientists
thus need to design “microsecond-
aware” systems stacks. They also need
to build on related work from the past
five years in this area (such as Caulfied
et al.,
3 Nanavati et al.,
12 and Ouster-
hout et al.
14) to continue redesigning
traditional low-level system optimiza-
tions—reduced lock contention and
synchronization, lower-overhead inter-
rupt handling, efficient resource utili-
zation during spin-polling, improved
job scheduling, and hardware offload-
ing but for the microsecond regime.
The high-performance comput-
ing industry has long dealt with low-
Synchronous APIs maintain the model of sequential
execution, and a thread’s state can be stored on the stack
without explicit management from the application. This
leads to code that is shorter, easier-to-understand, more
maintainable, and potentially even more efficient. In contrast,
with an asynchronous model, whenever an operation is
triggered, the code must typically be split up and moved into
different functions. The control flow becomes obfuscated,
and it becomes the application’s responsibility to manage
the state between asynchronous event-handling functions
that run to completion.
// Returns the number of words found in the document specified by ‘doc_id’.
int Count Words(const string& doc_id) {
Index index;
bool status = ReadDocumentIndex(doc_id, &index);
if (!status) return -1;
string doc;
status = ReadDocument(index.location, &doc);
if (!status) return -1;
return Count WordsInString(doc);
}
// Heap-allocated state tracked between asynchronous operations.
struct AsyncCount WordsState {
bool status;
std::function<void(int)> done_callback;
Index index;
string doc;
};
// Invokes the ‘done’ callback, passing the number of words found in the
// document specified by ‘doc_id’.
void AsyncCount Words(const string& doc_id, std::function<void(int)> done) {
// Kick off the first asynchronous operation, and invoke the
// ReadDocumentIndexDone when it finishes. State between asynchronous
// operations is tracked in a heap-allocated ‘state’ object.
auto state = new AsyncCount WordsState();
state->done_callback = done;
AsyncReadDocumentIndex(doc_id, &state->status, &state->index,
std::bind(&ReadDocumentIndexDone, state));
}
// First callback function.
void ReadDocumentIndexDone(AsyncCount WordsState* state) {
if (!state->status) {
state->done_callback(-1);
delete state;
} else {
// Kick off the second asynchronous operation, and invoke the
// ReadDocumentDone function when it finishes. The 'state' object
// is passed to the second callback for final cleanup.
AsyncReadDocument(state->index.location, &state->status,
&state->doc, std::bind(&ReadDocumentDone, state));
}
}
// Second callback function.
void ReadDocumentDone(AsyncCount WordsState* state) {
if (!state->status) {
state->done_callback(-1);
} else {
state->done_callback(CountWordsInString(state->doc));
}
delete state;
}
Consider a simple example where a function returns
the number of words that appears in a given document
identifier where the implementation may store the cached
documents on a microsecond-scale device. The example
represented in the first code portion makes two synchronous
accesses to microsecond-scale devices: the first retrieves an
index location for the document, and the second uses that
location to retrieve the actual document content for counting
words. With a synchronous programming model, it is not only
straightforward but the model also can completely abstract
away device accesses altogether by providing a natural,
synchronous Count Words function with familiar syntax.
The C++ 11 example in the second code portion shows an
asynchronous model where a “callback” (commonly known as
a “continuation”) is used. When the asynchronous operation
completes, the event-handling loop invokes the callback
to continue the computation. Since two asynchronous
operations are made, two callbacks are needed. Moreover
the Count Words API now must be asynchronous itself. And
unlike the synchronous example, state between asynchronous
operations must be explicitly managed and tracked on the
heap rather than on a thread’s stack.
Callbacks make the flow of control explicit rather than just invoking a function and waiting for it to complete. While languages and
libraries can make it somewhat easier to use callbacks and continuations (such as support for async/await, lambdas, tasks, and futures),
the result remains code that is arguably messier and more difficult to
understand than a simple synchronous function-calling model.
High-performance workloads also
tend to have simpler and static data
structures that lend themselves to
simpler, faster networking. Second, the
emphasis is primarily on performance
engines can be used. However, given
fine-grain and closely coupled over-
heads, new hardware optimizations
are needed at the processor micro-
architectural level to rethink the im-
plementation and scheduling of such
core functions that comprise future
microsecond-scale I/O events.
Solution Directions
latency networking. Nevertheless, the
techniques used in supercomputers are
not directly applicable to warehouse-scale computers (see Table 3). For
one, high-performance systems have
slower-changing workloads, and fewer
programmers need to touch the code.
Code simplicity and greatest programmer productivity are thus not as critical
as they are in deployments like Amazon
or Google where key software products
are released multiple times per week.