SCALING
in games & virtual worlds
anthropology, but each of the cultures has an opportunity to learn much from the other.
Even at this early stage, it is clear that this is going to be a complex venture. While early experience with the code has shown that the programming model does relieve the game or world server programmer from thinking about threads and locking, it has also shown that there are places where they do have to understand something about the underlying concurrency of the system. The most obvious of these is in the design of the data structures. One of the earliest users of our code was getting terrible performance from the system. When we looked at the code, we discovered that a single object was written to on every task, updating a global piece of game state. By designing the server in this way, this user effectively serialized all of the tasks that were running in the system, making it impossible for the server to get any advantage from the inherent parallelism in the game. Some minor redesign, breaking the single object into many (much smaller) objects, removed this particular bottleneck, with resulting gains in overall performance. This experience also taught us that we need to educate users of the system in the design of independent data structures that can be accessed in parallel.
Our own implementation has not been without some excitement. When we moved from a multithreaded server that ran on a single machine to an implementation that runs on multiple machines, we expected some degradation in the performance of the single-machine system. We were delighted to find that the single-node system degradation was not nearly as large as we thought it would be, but we found that additional machines lowered the capacity of the overall system. When presented with these measurements, this was not all that surprising to understand—the possibility for contention on multiple machines is greater than that on a single machine, and discovering and recovering from such contention takes longer. We are working on removing the choke points so that adding equipment actually adds capacity.
Measuring the performance of the system is made especially challenging by the lack of any clear notion of what the requirements of the target servers are. Game developers are notoriously secretive, and the notion of a characteristic load for a game or virtual world is not
16 November/December 2008 ACM QUEUE
something that is well documented. We have some examples that have been written by the team or by people we know in the game world, but we cannot be sure that these are accurate reflections of what is being written by the industry. Our hope is that the open-source community that is beginning to form around the project will aid in the production of useful performance and stress tests.
Seen in a broader light, the project has been and continues to be an interesting experiment in building levels of abstraction for the world of multithreaded, distributed systems. The problems we are tackling are not new. Large Web-serving farms have many of the same problems with highly variable demand. Scientific grids have similar problems of scaling over multiple machines. Search grids have similar issues in dealing with large-scale environments solving embarrassingly, but not completely, parallel problems.
What makes online games and virtual worlds interestingly different are the very different requirements they bring to the table compared with these other domains. The interactive, low-latency environment is very different from grids, Web services, or search. The growth from the entertainment industry makes the engineering disciplines far different from those others, as well. Solving these problems in this new environment is challenging, and adds to our general knowledge of how to write software on the emerging class of multithreaded, multicore, distributed systems.
And best of all, it’s fun. Q
LOVE IT, HATE IT? LET US KNOW
feedback@acmqueue.com or www.acmqueue.com/forums
JIM WALDO is a Distinguished Engineer with Sun Microsystems Laboratories, where he conducts research on large-scale distributed systems. Prior to (re)joining Sun Labs, he was the lead architect for Jini, a distributed programming system based on Java. He spent eight years at Apollo Computer and Hewlett-Packard, where he led the design and development of the first object request broker and was instrumental in getting that technology incorporated into the first OMG CORBA specification. Waldo is an adjunct faculty member at Harvard University, where he teaches distributed computing in the department of computer science. He has a Ph.D. in philosophy, holds M.A. degrees in both linguistics and philosophy, and has never taken a real computer science course. © 2008 ACM 1542-7730/ 08/1100 $5.00
This article appeared in print in the August 2008 issue of
Communications of the ACM.
rants: feedback@acmqueue.com
References:
Archives