I've read this, my question is quite similar yet somewhat different.
Note, I know C++0x does not guarantee that but I'm asking particularly for a multi-core machine like x86-64.
Let's say we have 2 threads (pinned to 2 physical cores) running the following code:
// I know people may delcare volatile useless, but here I do NOT care memory reordering nor synchronization/
// I just want to suppress complier optimization of using register.
volatile int n;
void thread1() {
for (;;)
n = 0xABCD1234;
// NOTE, I know ++n is not atomic,
// but I do NOT care here.
// what I cares is whether n can be 0x00001234, i.e. in the middle of the update from core-1's cache lines to main memory,
// will core-2 see an incomplete value(like the first 2 bytes lost)?
++n;
}
}
void thread2() {
while (true) {
printf('%d', n);
}
}
Is it possible for thread 2 to see n
to be something like 0x00001234, i.e. in the middle of the update from core-1's cache lines to main memory, will core-2 see an incomplete value?
I know a single 4-byte int
definitely fits into a typically 128-byte-long cache line, and if that int
does store inside one cache line then I believe no issues here... yet what if it acrosses the cache line boundary? i.e. will it be possbile that some char
already sit inside that cache line which makes first part of the n
in one cache line and the other part in the next line? If that is the case, then core-2 may have a chance seeing an incomplete value, right?
Also, I think unless making every char
or short
or other less-than-4-bytes
types padded to be 4-byte-long, one can never guarantee a single int
does not pass the cache line boundary, isn't it?
If so, would that suggest generally even setting a single int
is not guaranteed to be atomic on a x86-64 multi-core machine?
I got this question because as I researched on this topic, various people in various posts seem agreed on that as long as the machine architecture is proper(e.g. x86-64) setting an int
should be atomic. But as I argued above that does not hold, right?
I'd like to give some background of my question. I'm dealing with a real-time system, which is sampling some signal and putting the result into one global int, this is of course done in one thread. And in yet another thread I read this value and process it. I do not care the ordering of set and get, all I need is just a complete (vs. a corrrupted integer value) value.
The other question talks about variables "properly aligned". If it crosses a cache-line, the variable is not properly aligned. An int
will not do that unless you specifically ask the compiler to pack a struct, for example.
You also assume that using volatile int
is better than atomic<int>
. If volatile int
is the perfect way to sync variables on your platform, surely the library implementer would also know that and store a volatile x
inside atomic<x>
.
There is no requirement that atomic<int>
has to be extra slow just because it is standard. :-)