Search code examples
cmultithreadingmemoryruntime

How do variables behave when accessed among different threads in C?


Suppose I have two threads which run in parallel, one of them keeps a static array of x elements which it updates constantly every 100 ms. The other thread has access to a 'IsIn'' function which returns 1 if a certain element is present in said array.

My question is, is there a risk that when calling the IsIn function the array is being updated at the same time by the update function and could give an erroneous result? How does the managment of the memory allocated to that array actually work? Is there a way to prevent this problem?


Solution

  • I would like to...access the array in Thread 1 from through the IsIn function or even a simple Get function, from Thread 2, in a way that I can be sure that at the time that I call one of these functions in Thread 2, the array...is not being updated and changed mid-call, which could cause some [un]wanted results.

    Unwanted Results

    In C, The only possible "unwanted results" are the results that you don't want. As far as the C runtime is concerned, an array is just a chunk of memory, and the runtime does not care at all what you do to it.*

    The kind of "result" that most of us don't want is when there is some kind of structure to the data in the array such that some possible configurations are "valid" and other configurations are "invalid." It often is not possible for a thread to change the data from one valid state to another without temporarily putting it into an invalid state.

    In that case, the basic remedy is to use a mutex to protect the data. A mutex does two things:

    1. It has two states; "locked" and "unlocked," and it never allows more than one thread to lock the mutex at the same time, and
    2. It synchronizes the threads that lock and unlock it. That is to say, the lock and unlock actions define moments in time that can be compared between threads.

    Synchronization

    Synchronization is a Big Deal. You asked,

    is there a risk that when calling the IsIn function the array is being updated at the same time by the update function and could give an erroneous result?

    If event A happens in thread 1 and event B happens in thread 2 when there is no synchronization between the two threads, then there is no meaningful answer to the question "which event happened first?" There is no meaningful answer to, "did they happen at the same time?" But, if thread 1 only causes event A while it has some mutex locked, and thread 2 only causes B while it has the same mutex locked, then we know that the events could not possibly have happened at the same time. We then can reason about, or write code to prove, which event happened first.

    Using a Mutex

    So, if thread 1 updates structured data, and we want to ensure that thread 2 will never see the data in an invalid state, we do this:

    #include <pthread.h>
    
    pthread_mutex_t mutex;
    
    // During program startup, before creating the threads;
    if (pthread_mutex_init(&mutex, NULL) != 0) { 
        fprintf(stderr, "\n mutex init failed\n"); 
        exit(1); 
        }
    
    // In thread 1;
    pthread_mutex_lock(&mutex);
    ...update the shared data...
    ...Don't forget to always leave the data in a valid state...
    pthread_mutex_unlock(&mutex);
    
    // In thread 2;
    pthread_mutex_lock(&mutex);
    ...use the shared data, confident that it *must* be valid...
    pthread_mutex_unlock(&mutex);
    

    This guarantees that thread 2 will never use the data while thread 1 is part-way done updating it. If thread 2 tries to lock the mutex while thread 1 already has it locked, the pthread_mutex_lock() call in thread 2 simply will not return until after thread 1 unlocks the mutex.

    Who Goes First?

    My example DOES NOT prevent thread 2 from grabbing the mutex before thread 1 does its update. The mutex, when used in this way, only guarantees that thread 2 will see valid data. But thread 2 could see either the new valid state, or the older valid state. You cannot use a mutex to control the order in which the threads do their thing.†

    If you need threads to "meet up," "take turns," "wait for each other," or in any other way to control the order in which they do stuff, that's a higher level of synchronization, and it should be the topic of a different question.


    * The same is not true in some programming languages (e.g., Java) where an "array" is a structured object that can grow at run-time or, be moved by a garbage collector. In such languages, even if you don't care what is in the array, you still can mess up the run-time system if threads access it without adequate synchronization.

    † Lots of questions on this site from newbies who thought they could use mutexes to control some sequence of operations between threads. Sometimes it looks inviting to try, but it never works.