Search code examples
c++c++11x86atomicstdatomic

Where is the lock for a std::atomic?


If a data structure has multiple elements in it, the atomic version of it cannot (always) be lock-free. I was told that this is true for larger types because the CPU can not atomically change the data without using some sort of lock.

for example:

#include <iostream>
#include <atomic>

struct foo {
    double a;
    double b;
};

std::atomic<foo> var;

int main()
{
    std::cout << var.is_lock_free() << std::endl;
    std::cout << sizeof(foo) << std::endl;
    std::cout << sizeof(var) << std::endl;
}

the output (Linux/gcc) is:

0
16
16

Since the atomic and foo are the same size, I don't think a lock is stored in the atomic.

My question is:
If an atomic variable uses a lock, where is it stored and what does that mean for multiple instances of that variable ?


Solution

  • The easiest way to answer such questions is generally to just look at the resulting assembly and take it from there.

    Compiling the following (I made your struct larger to dodge crafty compiler shenanigans):

    #include <atomic>
    
    struct foo {
        double a;
        double b;
        double c;
        double d;
        double e;
    };
    
    std::atomic<foo> var;
    
    void bar()
    {
        var.store(foo{1.0,2.0,1.0,2.0,1.0});
    }
    

    In clang 5.0.0 yields the following under -O3: see on godbolt

    bar(): # @bar()
      sub rsp, 40
      movaps xmm0, xmmword ptr [rip + .LCPI0_0] # xmm0 = [1.000000e+00,2.000000e+00]
      movaps xmmword ptr [rsp], xmm0
      movaps xmmword ptr [rsp + 16], xmm0
      movabs rax, 4607182418800017408
      mov qword ptr [rsp + 32], rax
      mov rdx, rsp
      mov edi, 40
      mov esi, var
      mov ecx, 5
      call __atomic_store
    

    Great, the compiler delegates to an intrinsic (__atomic_store), that's not telling us what's really going on here. However, since the compiler is open source, we can easily find the implementation of the intrinsic (I found it in https://github.com/llvm-mirror/compiler-rt/blob/master/lib/builtins/atomic.c):

    void __atomic_store_c(int size, void *dest, void *src, int model) {
    #define LOCK_FREE_ACTION(type) \
        __c11_atomic_store((_Atomic(type)*)dest, *(type*)dest, model);\
        return;
      LOCK_FREE_CASES();
    #undef LOCK_FREE_ACTION
      Lock *l = lock_for_pointer(dest);
      lock(l);
      memcpy(dest, src, size);
      unlock(l);
    }
    

    It seems like the magic happens in lock_for_pointer(), so let's have a look at it:

    static __inline Lock *lock_for_pointer(void *ptr) {
      intptr_t hash = (intptr_t)ptr;
      // Disregard the lowest 4 bits.  We want all values that may be part of the
      // same memory operation to hash to the same value and therefore use the same
      // lock.  
      hash >>= 4;
      // Use the next bits as the basis for the hash
      intptr_t low = hash & SPINLOCK_MASK;
      // Now use the high(er) set of bits to perturb the hash, so that we don't
      // get collisions from atomic fields in a single object
      hash >>= 16;
      hash ^= low;
      // Return a pointer to the word to use
      return locks + (hash & SPINLOCK_MASK);
    }
    

    And here's our explanation: The address of the atomic is used to generate a hash-key to select a pre-alocated lock.