c multithreading concurrency stdatomic data-race

Does Non-atomic struct modification via _Atomic pointer produce data-race?

I'm trying to understand how C11 memory model works and wrote two functions containing expressions that conflict (in the sense of 5.1.2.4(p4)):

struct my_struct{
    uint64_t first;
    int64_t second;
} * _Atomic instance;

void* set_first(void *ignored){
    uint64_t i = 0;
    while(1){
        struct my_struct *ms = atomic_load_explicit(&instance, memory_order_acquire);
        ms -> first = i++;
        atomic_store_explicit(&instance, ms, memory_order_release);
        sleep(1);
    }
}

void* print_first(void *ignored){
    while(1){
        struct my_struct *ms = atomic_load_explicit(&instance, memory_order_acquire);
        uint64_t current = ms -> first;
        char buf[100];
        memset(buf, '\0', sizeof(buf));
        sprintf(buf, "%" PRIu64 "\n", current);
        fputs_unlocked(buf, stdout);
        sleep(2);
    }
}

And the main function:

int main(void){
    struct my_struct tmp = {.first = 0, .second = 0};
    atomic_init(&instance, &tmp);
    printf("main\n");
    pthread_t set_thread;
    pthread_create(&set_thread, NULL, &set_first, NULL);

    pthread_t print_thread;
    pthread_create(&print_thread, NULL, &print_first, NULL);
    while(1){
        sleep(100);
    }
}

So I tried to prove if the program contains no data-races. Here are my thoughts:

We know that a release operation on an atomic object synchronizes with an acquire operation on the object. So atomic_store_explicit(&instance, ms, memory_order_release); in the set_first synchronizes with atomic_load_explicit(&instance, memory_order_acquire) in the print_first.
Since the side effect ms -> first = i++ in the set_first function appeared before the atomic_store_explicit(&instance, ms, memory_order_release); in the program text I assumed it is sequenced-before it (This is I'm not sure about, could not find any normative references).
Combining bullets 1. and 2. implies that ms -> first = i++ inter-thread happens before atomic_load_explicit(&instance, memory_order_acquire); so they are in the happens before relationships.
Applying the data-race definition we conclude that confliting actions uint64_t current = ms -> first; in the print_first function and ms -> first = i++; in the set_first function do not produce data-race.

So the behavior seems to be well-defined.

The doubtful thing is assumed that ms -> first = i++; sequenced before atomic_store_explicit(&instance, ms, memory_order_release); because they occurred one before another in the program text.

Is it correct or the program contains data races?

Solution

Modifying non-atomic objects through a non-_Atomic pointer is not inherently data-race UB. (e.g. you could have an algorithm that did int *p = shared_ptr++; to have each thread grab its own slot in a non-atomic array.)

But in this case, you have a clear case of UB, because you have 2 threads accessing main's tmp.first, and they're not both reads.

A store with mo_release is sequenced after any previous stores (and loads), including non-atomic stores like ms->first = .... That's the point of release-stores vs. relaxed.

But the flaw in your reasoning is in step 1: atomic_store_explicit(&instance, ms, memory_order_release) in set_first only synchronizes-with acquire loads that see the value stored! Acquire loads in other threads don't magically wait for a release store that hasn't happened yet. The guarantee is that if/when you do load the value stored by a release store¹, you also see all earlier things from that thread.

If the acquire load happens before the release store (in the global order, if there is one), then there's no synchronization with it.

The acquire-load in both threads can happen simultaneously, and then the fox is in the henhouse: ms -> first = i++; and uint64_t current = ms -> first; are running with no synchronization.

It's totally irrelevant that the writing thread will later do a release-store to store the same value back into instance.

Footnote 1: (The "release-sequence" language in the standard extends this to seeing the result of a RMW operation that modified the result of the initial release-store, and so on.)

As far as other threads are concerned, the atomic_load_explicit in set_first is basically irrelevant. You might as well hoist it out of the loop.