Search code examples
cpthreadsposixcpu-cachememory-model

pthread_create(3) and memory synchronization guarantee in SMP architectures


I am looking at the section 4.11 of The Open Group Base Specifications Issue 7 (IEEE Std 1003.1, 2013 Edition), section 4.11 document, which spells out the memory synchronization rules. This is the most specific by the POSIX standard I have managed to come by for detailing the POSIX/C memory model.

Here's a quote

4.11 Memory Synchronization

Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads:

fork() pthread_barrier_wait() pthread_cond_broadcast() pthread_cond_signal() pthread_cond_timedwait() pthread_cond_wait() pthread_create() pthread_join() pthread_mutex_lock() pthread_mutex_timedlock()

pthread_mutex_trylock() pthread_mutex_unlock() pthread_spin_lock() pthread_spin_trylock() pthread_spin_unlock() pthread_rwlock_rdlock() pthread_rwlock_timedrdlock() pthread_rwlock_timedwrlock() pthread_rwlock_tryrdlock() pthread_rwlock_trywrlock()

pthread_rwlock_unlock() pthread_rwlock_wrlock() sem_post() sem_timedwait() sem_trywait() sem_wait() semctl() semop() wait() waitpid()

(exceptions to the requirement omitted).

Basically, paraphrasing the above document, the rule is that when applications read or modify a memory location while another thread or process may modify it, they should make sure to synchronize the thread execution and memory with respect to other threads by calling one of the listed functions. Among them, pthread_create(3) is mentioned to provide that memory synchronization.

I understand that this basically means there needs to be some sort of memory barrier implied by each of the functions (although the standard seems not to use that concept). So for example returning from pthread_create(), we are guaranteed that the memory modifications made by that thread before the call appear to other threads (running possibly different CPU/core) after they also synchronize memory. But what about the newly created thread - is there implied memory barrier before the thread starts running the thread function so that it unfailingly sees the memory modifications synchronized by pthread_create()? Is this specified by the standard? Or should we provide memory synchronization explicitly to be able to trust correctness of any data we read according to POSIX standard?

Special case (which would as a special case answer the above question): does a context switch provide memory synchronization, that is, when the execution of a process or thread is started or resumed, is the memory synchronized with respect to any memory synchronization by other threads of execution?

Example:

Thread #1 creates a constant object allocated from heap. Thread #1 creates a new thread #2 that reads the data from the object. If we can assume the new thread #2 starts with memory synchronized then everything is fine. However, if the CPU core running the new thread has copy of previously allocated but since discarded data in its cache memory instead of the new value, then it might have wrong view of the state and the application may function incorrectly.

More concretely...

  1. Previously in the program (this is the value in CPU #1 cache memory)

     int i = 0;        
    
  2. Thread T0 running in CPU #0:

     pthread_mutex_lock(...);
     int tmp = i;
     pthread_mutex_unlock(...);
    
  3. Thread T1 running in CPU #1:

     i = 42;
     pthread_create(...);
    
  4. Newly created thread T2 running in CPU #0:

     printf("i=%d\n", i);    /* First step in the thread function */
    

Without memory barrier, without synchronizing thread T2 memory it could happen that the output would be

     i=0

(previously cached, unsynchronized value).

Update: Lot of applications using POSIX thread library would not be thread safe if this implementation craziness was allowed.


Solution

  • is there implied memory barrier before the thread starts running the thread function so that it unfailingly sees the memory modifications synchronized by pthread_create()?

    Yes. Otherwise there would be no point to pthread_create acting as memory synchronization (barrier).

    (This is afaik. not explicitly stated by posix, (nor does posix define a standard memory model), so you'll have to decide whether you trust your implementation to do the only sane thing it possibly could - ensure synchronization before the new thread is run- I would not worry particularly about it).

    Special case (which would as a special case answer the above question): does a context switch provide memory synchronization, that is, when the execution of a process or thread is started or resumed, is the memory synchronized with respect to any memory synchronization by other threads of execution?

    No, a context switch does not act as a barrier.

    Thread #1 creates a constant object allocated from heap. Thread #1 creates a new thread #2 that reads the data from the object. If we can assume the new thread #2 starts with memory synchronized then everything is fine. However, if the CPU core running the new thread has copy of previously allocated but since discarded data in its cache memory instead of the new value, then it might have wrong view of the state and the application may function incorrectly.

    Since pthread_create must perform memory synchronization, this cannot happen. Any old memory that reside in a cpu cache on another core must be invalidated. (Luckily, the commonly used platforms are cache coherent, so the hardware takes care of that).

    Now, if you change your object after you've created your 2. thread, you need memory synchronization again so all parties can see the changes, and otherwise avoid race conditions. pthread mutexes are commonly used to achieve that.