flushes (process of discarding instructions in the middle of
execution to enforce a change in control flow). There can be
multiple instructions with the same ID in a processor at any
given time that may execute out of program order making it
very difficult, if not impossible, to distinguish.
The PC (program counter) value cannot be used as an
instruction ID for processors supporting out-of-order
execution, because programs with loops may produce multiple instances of the same instruction with the same PC value.
These multiple instances may execute out of program order.
It is difficult to use time-stamps or other global synchronization mechanisms as instruction IDs for processors
supporting multiple clock domains and/or DVFS (dynamic
voltage and frequency scaling) for power management.
Our special ID assignment scheme, described below,
uses log24n bits, where n is the maximum number of instructions in a processor at any one time (e.g., n = 64 for Alpha
21264). The first two rules assign consecutive numbers to
incoming instructions and the third rule allows the scheme
to work18 under all the aforementioned circumstances:
i.e., for processors supporting out-of-order execution, pipeline flushes, multiple clock domains and DVFS.
Instruction IDs are assigned to individual instructions
as they exit the fetch stage and enter the decode stage. Since
multiple instructions may exit the fetch stage in parallel at
any given clock cycle, multiple IDs are assigned in parallel.
Instruction ID Assignment Scheme used by IFRA:
2. 2. Post-trigger generators
rule 1: the first p instructions that exit the fetch stage
in parallel are assigned iDs, 0, 1, 2, …, p − 1.
rule 2: Let iD X be the last iD that was assigned.
if there are q instructions that exit the fetch stage in
the current cycle in parallel, then q iDs, X + 1 (mod 4n),
X + 2 (mod 4n), …, X + q (mod 4n) are assigned to the q
rule 3: if an instruction with iD Y causes a pipeline
flush, then the iD X in rule 2 is overwritten with the value
of Y + 2n (mod 4n). as a result, iD of Y + 2n + 1 (mod 4n)
is assigned to the first instruction that is fetched after the
flush. the flush is caused either by a mispredicted branch
or an exception.
Suppose that a test program has been executing for billions
of cycles and an electrical bug is exercised after 5 billion
cycles from start. Moreover, suppose that the electrical
bug causes a system crash after another 1 billion cycles (i.e.,
6 billion cycles from the start). With limited storage, we are
only interested in capturing the information around the
time when the electrical bug is exercised. Hence, 5 billions
of cycles worth of information before the bug occurrence
may not be necessary. On the other hand, if we stop recording only after the system crashes, all the useful recorded
information will be overwritten. Thus, we must incorporate
mechanisms, referred to as post-triggers, for reducing error
detection latency, the length of time between the appearance
of an error caused by a bug and visible system failure.
Post-triggers targeting five different failure scenarios
are listed in Table 2. A hard post-trigger fires when there is
an evident sign of failure, and causes the processor operation to terminate. Classical hardware error detection
techniques such as parity bits for arrays and residue codes
for arithmetic units20 as well as in-built exceptions, such
as unimplemented instruction exceptions and arithmetic
exceptions, belong to this category.
However, hard post-triggers mechanisms alone are not
sufficient, e.g., two tricky scenarios described in the last
two rows of Table 3. These two failure scenarios may be
detected several millions of cycles after an error occurs,
causing useful recorded information to be overwritten even with the existing error detection mechanisms.
Hence, we introduce the notion of soft post-triggers.
A soft post-trigger fires when there is an early symptom
of a possible failure. It causes the recording in all recorders to pause, but allows the processor to keep running. If
a hard post-trigger for the failure corresponding to the
symptom occurs within a pre-specified amount of time,
the processor stops. If a hard post-trigger does not fire
within the specified time, the recording resumes assuming that the symptom was false.
Segmentation fault (or segfault) requires OS handling
and, hence, may take several millions of cycles to resolve.
Null-pointer dereference is detected by adding simple
hardware in the Load/Store unit. For other illegal memory
accesses, TLB-miss is used as the soft post-trigger. If a segfault is not declared by the OS while servicing the TLB-miss,
the recording is resumed on TLB-refill. On the other hand, if
a segfault is returned, then a hard post-trigger is activated.
3. Post-AnALYsis techniQues
Once recorder contents are scanned out, footprints belonging to same instruction (but in multiple recorders) are identified and linked together using a technique called footprint
linking (Section 3. 1). The linked footprints are also mapped
to the corresponding instruction in the test-program binary
using the program counter value stored in the fetch-stage
recorder (Table 2).
As shown in Figure 3, after the footprint linking, four
high-level post-analysis techniques (Section 3. 2) that
are independent of microarchitecture are run. After which,
table 3. failure scenarios and post-triggers.
long ( 2 secs)
Segfault Segfault from OS;
Address equals 0
Short ( 2 mem