c++multithreading memory optimization atomic

Does writing to a region of memory from multiple threads cause contention?

Let's say I have a contiguous section of memory from addresses 0 to 128, and neatly bisect it so that 6 threads work on every sixth byte, thread 1 gets 0, 6, 12, 18..., thread 2 gets 1, 7, 13, 19..., etc

If these threads write to these bytes, will it cause the CPU to try and synchronize the caches across each core because they're local to one another? What if each byte is accessed as a std::atomic<uint8>?

Solution

I don't know about all CPUs as I'm most familiar with Intel 64-bit. Though in general, I would say YES if at least one thread would be writing to the memory.

This all has to do with cache lines. In my PC, the cache line is 64 byte (not bit), a number you can retrieve via the std::hardware_destructive_interference_size.

By ignoring this, you fall in a trap called: false sharing. This is the invalidation of the cache line that you use by a write to an unrelated value on that same cache line.

You could use the std::memory_order to prevent this, however, the value is a minimal requirement, which on Intel 64bit gets ignored most of the time as the CPU itself guarantees the std::memory_order_seq_cst. It might still have effect on optimization. (For the little optimization passes that can deal with atomics)

To conclude: give the threads regions of memory iso random elements out of it whenever possible.