Search code examples
cmemory-barriers

Double check locking pattern issue in C


As everybody who has looked into this, I've read the paper http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf

I have a question about the barriers when DCLP is implemented on a C structure. Here is the code:

typedef struct _singleton_object {
    int x;
    int y;
} sobject;

static sobject *singleton_object = NULL;

sobject *get_singleton_instance()
{
    sobject *tmp = singleton_object;

    /* Insert barrier here - compiler or cpu specific or both? */
    if (tmp == NULL) {
        mutex_lock(&lock); /* assume lock is declared and initialized properly*/
        tmp = singleton_object;
        if (tmp == NULL) {
            tmp = (sobject *)malloc(sizeof(sobject)); /* assume malloc succeeds */
            tmp->x = 5;
            tmp->y = 7;

            /* Insert barrier here - compiler or cpu specific or both ?*/
            singleton_object = tmp;
        }
        mutex_unlock(&lock);
    }
    return tmp;
}

The first question is as in the comments: When the paper describes insert barriers does it mean just the compiler, CPU or both? I assume both.

My second question is: what prevents the compiler from replacing tmp with singleton_object in the code? What forces the load of singleton_object into tmp, which could be in a register or stack in compiler generated code ? what if the compiler, at every reference to tmp, actually does a load into register from &singleton_object and discard that value? It seems like the solution in the paper referenced below depends on the fact that we are using the local variable. if the compiler does not load the value in the pointer variable to the local variable tmp, we are back to the original problem described in the paper.

My third question is: Assuming, the compiler does copy the value of singleton_object locally into a register or stack(i.e. variable tmp), Why do we need the first barrier? There should be no reordering of tmp = singleton_object and if (tmp == NULL) in the beginning of the function, since there is an implicit read after write dependency with tmp. Also, even if we read a stale value from the CPU's cache in the first load to tmp, it should be read as NULL. If it is not NULL, then the object construction should be complete, since the thread/CPU that constructs it should execute the barrier, which ensures that the stores to x and y are visible to all CPU's before singleton_object has a non NULL value.


Solution

    1. When the paper describes insert barriers does it mean just the compiler, CPU or both?

      Both barriers should be CPU-barriers (which implies compiler barriers).

    2. what prevents the compiler from replacing tmp with singleton_object in the code?

      The barrier after assignment

      sobject *tmp = singleton_object;
      

      among other things means (both for CPU and compiler):

      ** All read accesses issued before the barrier should be completed before the barrier.

      Because of that, compiler is not allowed to read singleton_object variable instead of tmp after the barrier.

    3. If it (singleton_object) is not NULL, then the object construction should be complete, since the thread/CPU that constructs it should execute the barrier, which ensures that the stores to x and y are visible to all CPU's before singleton_object has a non NULL value.

      You need to perform barrier for actual use these "visible" variables x and y. Without the barrier read thread may use stale values.

      As a rule, every syncrhonization between different threads requires some sort of "barrier" on both sides: read and write.