world. Rather than splitting the game
up into regions or shards at compile
time, virtual worlds or games based
on the Darkstar stack can move load
around the network of server machines
at runtime. While the participant
might see a short increase in latency
during the move, the overall latency
will be decreased after the move. By
moving tasks, we not only can balance
the load on the machines involved, but
also try to collocate tasks that are accessing the same set of data or that are
communicating with each other. All of
these mechanisms allow us to determine, while the game is being played,
which tasks (and which users) should
be placed on the same server.
The project is in its early stages of
development and deployment. It is
based on an open-source licensing
model and community, so we are relying on our users to educate us about
the needs of the community that will
build the games and worlds that will
run on the infrastructure. The research
is part computer science and part anthropology, but each of the cultures
has an opportunity to learn much from
the other.
about the underlying concurrency of
the system. The most obvious of these
is in the design of the data structures.
One of the earliest users of our code
was getting terrible performance from
the system. When we looked at the
code, we discovered that a single object
was written to on every task, updating
a global piece of game state. By designing the server in this way, this user effectively serialized all of the tasks that
were running in the system, making it
impossible for the server to get any advantage from the inherent parallelism
in the game. Some minor redesign,
breaking the single object into many
(much smaller) objects, removed this
particular bottleneck, with resulting
gains in overall performance. This experience also taught us that we need
to educate users of the system in the
design of independent data structures
that can be accessed in parallel.
Nor has our own implementation
been without some excitement. When
we moved from a multithreaded server that ran on a single machine to an
implementation that runs on multiple
machines, we expected some degradation in the performance of the single-
Even at this early stage, it is clear
that this is going to be a complex venture. While early experience with the
code has shown that the programming model does relieve the game or
world server programmer from thinking about threads and locking, it has
also shown that there are places where
they do have to understand something
machine system. We were delighted to
find that the single-node system degradation was not nearly as large as we
thought it would be, but we found that
additional machines lowered the capacity of the overall system. When presented with these measurements, this was
not all that surprising to understand—
the possibility for contention on mul-
tiple machines is greater than that on
a single machine, and discovering and
recovering from such contention takes
longer. We are working on removing
the choke points so that adding equipment actually adds capacity.
Measuring the performance of the
system is made especially challenging
by the lack of any clear notion of what
the requirements of the target servers
are. Game developers are notoriously
secretive, and the notion of a characteristic load for a game or virtual world is
not something that is well documented. We have some examples that have
been written by the team or by people
we know in the game world, but we
cannot be sure that these are accurate
reflections of what is being written by
the industry. Our hope is that the open-source community that is beginning to
form around the project will aid in the
production of useful performance and
stress tests.
Seen in a broader light, the project
has and continues to be an interesting experiment in building levels of
abstraction for the world of multithreaded, distributed systems. The
problems we are tackling are not new.
Large Web-serving farms have many
of the same problems with highly variable demand. Scientific grids have similar problems of scaling over multiple
machines. Search grids have similar
issues in dealing with large-scale environments solving embarrassingly but
not completely parallel problems.
What makes online games and virtual worlds interestingly different are
the very different requirements they
bring to the table compared to these
other domains. The interactive, low-latency environment is very different
from grids, Web services, or search.
The growth from the entertainment
industry makes the engineering disciplines far different from those others,
as well. Solving these problems in this
new environment is challenging, and
adds to our general knowledge of how
to write software for the emerging class
of multithreaded, multicore, distributed systems.
And best of all, it’s fun.
Jim Waldo is a Distinguished Engineer with Sun
Microsystems Laboratories, Burlington, MA, where he
conducts research on large-scale distributed systems.
© 2008 ACM 0001-0782/08/0800 $5.00
SYNCRETIA IN SECOND LIFE, BY ALPHA AUER, AKA. ELIF AYI TER