Commonly, cacheline is 64B but atomicity of non-volatile memory is 8B.
For example:
x[1]=100;
x[2]=100;
clflush(x);
x
is cacheline aligned, and is initially set to 0
.
System crashs in clflush();
Is it possible x[1]=0
, x[2]=100
after reboot?
Under the following assumptions:
The global observablility order of stores may differ from the persist order on Intel x86 processors. This is referred to as relaxed persistency. The only case in which the order is guaranteed to be the same is for a sequence of stores of type WB to the same cache line (but a store reaching GO doesn't necessarily meant it's become durable). This is because CLFLUSH
is atomic and WB stores cannot be reordered in global observability. See: On x86-64, is the “movnti” or "movntdq" instruction atomic when system crash?.
The x86-TSO memory model doesn't allow reordering stores, so it's impossible for another agent to observe x[2] == 100
and x[1] != 100
during normal operation (i.e., in the volatile state without a crash). However, if the system crashed and rebooted, it's possible for the persistent state to be x[2] == 100
and x[1] != 100
. This is possible even if the system crashed after retiring clflush
because the retirement of clflush
doesn't necessarily mean that the cache line flushed has reached the persistence domain.
If you want to eliminate that possibly, you can either move clflush
as follows:
x[1]=100;
clflush(x);
x[2]=100;
clflush
on Intel processors is ordered with respect to all writes, meaning that the line is guaranteed to reach the persistence domain before any later stores become globally observable. See: Persistent Memory Programming Primary (PDF) and the Intel SDM V2. The second store could be to the same line or any other line.
If you want x[1]=100
to become persistent before x[2]=100
becomes globally observable, add sfence
after clflush
on Intel CSX or mfence
on AMD processors (clflush
is only ordered by mfence
on AMD processors). clflush
by itself sufficient to control persist order.
Alternatively, use the sequenceclflushopt+sfence
(or clwb+sfence
) as follows:
x[1]=100;
clflushopt(x);
sfence;
x[2]=100;
In this case, if a crashed happened and if x[2] == 100
in the persistent state, then it's guaranteed that x[1] == 100
. clflushopt
by itself doesn't impose any persist ordering.