Search code examples
multithreadingconcurrencyatomic

Why are loads required when using atomics


When using atomics in Go (and other languages like c++) its advised to use an atomic load operation for reading a concurrently written value.

If the definition (as I understand it) of an atomic write (be it a store or an integer increment) is that no thread can view a partial write, why is an atomic load required?

Would a plain load of the memory address always be safe from a torn view, if only atomic stores are used on that memory address?


Solution

  • This answer is mainly for C and C++ as I am not directly familiar with atomics in many other languages, but I suspect they are similar.

    It's true that many actual machines work this way, in some cases. For instance, on x86-64, ordinary load instructions are atomic with respect to ordinary stores or locked read-modify-write instructions. So for types that can be loaded with a single instruction, you could in principle use ordinary assignment and avoid tearing.

    But there are cases where this would not work. For instance:

    • Types which are not lock-free (e.g. structs of more than a couple words). In this case, several instructions are needed to load or store, and so a lock must be taken around them, or tearing is entirely possible. The atomic load function knows to take the lock, an ordinary assignment wouldn't.

    • Types which can be lock-free but need special handling. For example, 64-bit long long int on x86-32. An ordinary load would execute two 32-bit integer load instructions (which are individually atomic), and so even if the store is atomic, it could happen in between. But the atomic load function can emit a 64-bit floating point or SIMD load, which is less efficient but does it in one atomic instruction. Example on godbolt.

    As such, the language promises atomicity only when the store and load both use the provided atomic functions. - your "definition" is not accurate for C or C++. By requiring the programmer to always use an atomic load, the language provides a "hook" where implementations can take appropriate action if needed. In cases where an ordinary load would suffice, the implementation can optimize accordingly and nothing is lost.

    Another point is that the atomic load provides a place to put a memory barrier when one is wanted (any ordering except relaxed). Some architectures include load instructions with a built-in barrier (e.g. ARM64's ldar), and making the barrier part of the load at the language level makes it easier for the compiler to take advantage of this. If you had to do a regular assignment followed by a call to a barrier function, it would be harder for the compiler to figure out that it could optimize them into ldar.