before the mutator could overwrite them, which could then
be sequenced onto the Root to Add port.
As will be seen in Section 4. 4, collection is triggered
(only) by an allocation that causes free space to drop below
a threshold. Therefore the generation of root snapshot logic
only needs to consider those hardware states in which this
might occur. Any register or stack not live in those states can
be safely ignored.
4. 3. tracing
The tracing engine, along with a single pointer memory
(corresponding to a single pointer field in an object) is
shown in Figure 3. It provides the same mutator interface as
the malloc/free style memory manager of Figure 1: Addr to
Read/Write, Pointer to Write, and Pointer Value—except that
the external interface Addr to Free is replaced by the internal
interface (denoted in red) Addr to Clear, which is generated
by the Sweep module (described in Section 4. 4).
The only additional interface is the Root to Add port which
takes its inputs from the output port of the same name of the
Root Engine in Figure 2.
As it executes, there are three sources of pointers for the
engine to trace: externally added roots from the snapshot,
internally traced roots from the pointer memory, and overwritten pointers from the pointer memory (captured with
a Yuasa-style barrier to maintain the snapshot property).
The different pointer sources flow through a MUX, and on
each cycle a pointer can be presented to the Mark Map, which
contains one bit for each of the N memory locations.
Using the BRAM read-before-write mode, the old mark
value is read, and then the mark value is unconditionally set
to 1. If the old mark value is 0, this pointer has not yet been
traversed, so the negation of the old mark value (indicated
by the bubble) is used to control whether the pointer is added
to the Mark Queue (note that this means that all values in the
Mark Queue have been filtered, so at most N − 1 values can
flow through the queue).
Pointers from the Mark Queue are presented as a read
address on port B of the Pointer Memory, and if the fetched
root snapshot module, simplified to show a single stack and
a single register.
The snapshot is controlled by the GC input signal, which
goes high for one clock cycle at the beginning of collection.
The snapshot is defined as the state of the memory at the
beginning of the next cycle after the GC signal goes high.
This allows some setup time and reduces synchronization
requirements. Note that if the GC signal is asserted while a
collection is already under way, then the trigger is ignored.
The register snapshot is obtained by using a shadow register.
In the cycle after the GC signal goes high, the value of the register is copied into the shadow register. This can happen even
if the register is also written by the mutator in the same cycle,
since the new value will not be latched until the end of the cycle.
The stack snapshot is obtained by having another register in addition to the Stack Top register, called the Scan
Pointer. In the same cycle that the GC signal goes high, the
value of the Stack Top pointer minus one is written into
the Scan Pointer (because the Stack Top points to the entry
above the actual top value). Beginning in the following
cycle, the Scan Pointer is used as the source address to port
B of the BRAM set containing the stack, and the pointer is
read out, going through the MUX and emerging on the Root
to Add port from the snapshot module. The Scan Pointer is
also decremented in preparation for the following cycle.
Note that the mutator can continue to use the stack via
port A of the BRAM set, while the snapshot uses port B. And
since the mutator cannot pop values off the stack faster than
the collector can read them out, the property is preserved
that the snapshot contains exactly those roots that existed
in the cycle following the GC signal.
A detail omitted from the diagram is that a state machine
is required to sequence the values from the stack and the
shadow register(s) through the MUX to the Root to Add port.
Note that the values from the stack must be processed first,
because the stack snapshot technique relies on staying
ahead of the mutator without any explicit synchronization.
If multiple stacks were desired, then a “shadow” stack
would be required to hold values as they were read out
figure 2. atomic root snapshot engine.
1 Mark Queue
Pointer to Trace
figure 3. tracing engine and one-pointer memory.