7. 2. Validation of real-time bounds
Because the design and analysis of our concurrent collector
is intended to be cycle-accurate, we can validate the time
and space bounds of the collector with expectation that
they will be fully met. Figure 8 shows the actual time spent
in garbage collection and the analytic upper bound ( Tmax
from equation 3). Note that we show both average as well as
the maximum time spent in garbage collections. The heap
size is chosen to be much tighter than in earlier graphs
as our focus here is how the collector behaves when it is
under stress (near Nmin from equation 4). For convenience,
we express heap size as a fraction (N/m) of the maximum
amount of live data since our bounds are almost linear
when considered in terms of m and N.
Average time spent in collection is always less than the
predicted worst case with an actual difference of about
10% for both programs. We also show the amount of stalls
experienced by the benchmark as a fraction of total time.
At larger heap sizes, there are no stalls. As the heap size is
reduced, there will come a point when the collector cannot
keep up and the mutator’s allocation request will fail. For
the allocation-intensive Deque benchmarks, the failure
point occurs at 1. 43 × m. Our predicted Nmin value of 1.457
is correctly above the actual failure points.
Because the average collection time includes multiple
phases of a program, it can be significantly lower than the
maximum collection time. We see that the gap between
Tmax and collection time shrinks from 10% to about 2% and
6% when one considers maximum rather than average collection time. For space, Nmin has only a worst-case flavor as
there is adequate heap space only if the heap is sufficient at
every collection. The space bound is within 3% of when stalls
begin. Our time and space bounds are not only empirically
validated but are tight.
In general, time spent for a single collection falls as the
heap size is decreased since the sweep phase will take less
time. It may seem surprising that this happens even when the
heap size is taken below Nmin. However, falling below this
safe point causes mutator stalls but does not penalize the
collector at all. In fact, because the mutator is stalled, it can
no longer interfere with the collector which will additionally, though very slightly, speed up collection. Of course,
since the overall goal is to avoid mutator stalls, operating in
this regime is inadvisable.
at N = 1.1m. By contrast RTGC consumes slightly fewer
cycles than Malloc until it begins to experience stall cycles
(non-real-time behavior) at N = 1.4m because it falls behind
The Deque benchmark has a very simple logic so any
limitations on frequency introduced by the collector is
magnified. The effect is seen clearly in Figure 7(b): Malloc
synthesizes at a higher frequency, allowing it to make up
RTGC’s slight advantage in cycles and consume 25% less
time on an average. STW suffers even more from the combined effect of a lower clock frequency and additional cycles
due to synchronous collection. On average, RTGC is faster
than STW by 14% and never interrupts the application.
These measurements reveal some surprising trends
that are completely contrary to the expected trade-offs
for software collectors: RTGC is actually faster, more
deterministic, and requires less heap space than STW!
There seems to be no reason to use STW because the
natural advantage of implementing concurrency in
hardware completely supersedes the traditional latency
versus bandwidth trade-off.
Furthermore, RTGC allows applications to run at far
lower multiples of the maximum live set m than possible
for either real-time or stop-the-world collectors in software.
RTGC is also only moderately slower than Malloc, meaning
that the cost of abstraction is quite palatable. We do not
show the results for other applications but note that, as predicted, this performance gap decreases as the application
becomes more complex.
Heap Size (Objects)
16384 14336 12288 10240 8192
Heap Size (Objects)
14336 12288 10240 8192
figure 7. throughput measurements for Deque. (a) execution duration in cycles of Deque; (b) execution time in milliseconds of Deque.
1024 2048 4096 8192
Heap Size (Objects)
16384 32768 65536
figure 6. Synthesized clock frequency.