descriptions of these tests, e.g. “stores are not reordered
with other stores,” but note that “not reordered with” is not
defined there and is open to misinterpretation. 27, § 3. 2
Example 8–1. Stores Are not Reordered with other
Stores.
Proc 0 Proc 1
Mov [x]← 1
Mov [y]← 1
Mov eAX←[y]
Mov ebX←[x]
Forbidden Final State: Proc 1:eAX= 1 ∧ Proc 1:ebX=0
This test implies that the writes by Proc 0 are seen in order by
Proc 1’s reads, which also execute in order. x86-TSO forbids
the final state because Proc 0’s store buffer is FIFO, and Proc
0 communicates with Proc 1 only through shared memory.
Example 8–2. Stores Are Not Reordered with Older
Loads.
Proc 0 Proc 1
Mov eAX←[x]
Mov [y]← 1
Mov ebX←[y]
Mov [x]← 1
Forbidden Final State: Proc 0:eAX= 1 ∧ Proc 1:ebX= 1
x86-TSO forbids the final state because reads are never
delayed.
Example 8–3. Loads May Be Reordered with Older
Stores. This test is just the SB example from Section 1,
which x86-TSO permits. The third AMD test (amd3) is similar but with additional writes inserted in the middle of each
thread, of 2 to x and y respectively.
Example 8–4. Loads Are not Reordered with Older
Stores to the Same Location.
Proc 0
Mov [x]← 1
Mov eAX←[x]
Required Final State: Proc 0:eAX= 1
x86-TSO requires the specified result because reads must
check the local store buffer.
Example 8–5. Intra-Processor For warding Is Allowed.
This test is similar to Example 8–3.
Example 8–6. Stores Are Transitively Visible.
Proc 0 Proc 1 Proc 2
Mov [x]← 1 Mov eAX←[x]
Mov [y]← 1
Mov ebX←[y]
Mov eCX←[x]
Forbidden Final State: Proc 1:eAX= 1 ∧ Proc 2:ebX= 1 ∧ Proc 2:eCX=0
x86-TSO forbids the given final state because otherwise
the Proc 2 constraints imply that y was written to shared
memory before x. Hence the write to x must be in Proc 0’s
store buffer (or the instruction has not executed), when the
write to y is initiated. Note that this test contains the only
mention of “transitive visibility” in the Intel SDM, leaving
its meaning unclear.
Example 8–7. Stores Are Seen in a Consistent Order by
Other Processors. This test rules out the IRIW behavior as
described in Section 2. 2. x86-TSO forbids the given final state
because the Proc 2 constraints imply that x was written to
shared memory before y whereas the Proc 3 constraints imply
that y was written to shared memory before x.
Example 8–8. Locked Instructions Have a Total
Order. This is the same as the IRIW Example 8–7 but with
LOCK’d instructions for the writes; x86-TSO forbids the final
state for the same reason as above.
Example 8–9. Loads Are not Reordered with Locks.
Proc 0 Proc 1
XChG [x]←eAX
Mov ebX←[y]
XChG [y]←eCX
Mov edX←[x]
Initial state: Proc 0:eAX= 1 ∧ Proc 1:eCX= 1 (elsewhere 0)
Forbidden Final State: Proc 0:ebX=0 ∧ Proc 1:edX=0
This test indicates that locking both writes in Example 8–3
would forbid the nonsequentially consistent result. x86-
TSO forbids the final state because LOCK’d instructions
flush the local store buffer. If only one write were LOCK’d
(say the write to x), the Example 8–3 final state would be
allowed as follows: on Proc 1, buffer the write to y and execute the read x, then on Proc 0 write to x in shared memory
then read from y.
Proc 0 Proc 1
XChG [x]←eAX
Mov [y]← 1
Mov ebX←[y]
Mov eCX←[x]
Initial state: Proc 0:eAX= 1 (elsewhere 0)
Forbidden Final State: Proc 1:ebX= 1 ∧ Proc 1:eCX=0
Example 8–10. Stores Are not Reordered with Locks.
This is implied by Example 8–1, as we treat the memory writes
of LOCK’d instructions as stores.
Proc 0
Proc 1
Test amd5.
Mov [x]← 1
MFenCe
Mov eAX←[y]
Mov [y]← 1
MFenCe
Mov ebX←[x]
Forbidden Final State: Proc 0:eAX=0 ∧ Proc 1:ebX=0
For x86-TSO, this test has the same force as Example 8. 8, but
using MFenCe instructions to flush the buffers instead of
LOCK’d instructions. The tenth AMD test is similar. None of
the Intel litmus tests include fence instructions.
In x86-TSO adding MFenCe between every instruction
would clearly suffice to regain sequential consistency (though
obviously in practice one would insert fewer barriers), in contrast to IWP/x86-CC/AMD3.14.
3. 3. empirical testing
To build confidence that we have a sound model of the
behavior of actual x86 processors we have tested the