Search code examples
multithreadingconcurrencyoperating-systemproducer-consumermemory-barriers

Memory barrier in the implementation of single producer single consumer


The following implementation from Wikipedia:

volatile unsigned int produceCount = 0, consumeCount = 0;
TokenType buffer[BUFFER_SIZE];

void producer(void) {
    while (1) {
        while (produceCount - consumeCount == BUFFER_SIZE)
            sched_yield(); // buffer is full

        buffer[produceCount % BUFFER_SIZE] = produceToken();
        // a memory_barrier should go here, see the explanation above
        ++produceCount;
     }
}

void consumer(void) {
    while (1) {
        while (produceCount - consumeCount == 0)
            sched_yield(); // buffer is empty

        consumeToken(buffer[consumeCount % BUFFER_SIZE]);
        // a memory_barrier should go here, the explanation above still applies
        ++consumeCount;
     }
}

says that a memory barrier must be used between the line that accesses the buffer and the line that updates the Count variable.

This is done to prevent the CPU from reordering the instructions above the fence along-with that below it. The Count variable shouldn't be incremented before it is used to index into the buffer.

If a fence is not used, won't this kind of reordering violate the correctness of code? The CPU shouldn't perform increment of Count before it is used to index into buffer. Does the CPU not take care of data dependency while instruction reordering?

Thanks


Solution

  • It seems, that your question is "can incrementing Count and assigment to buffer be reordered without changing code behavior?".

    Consider following code tansformation:

    int count1 = produceCount++;
    buffer[count1 % BUFFER_SIZE] = produceToken();
    

    Notice that code behaves exactly as original one: one read from volatile variable, one write to volatile, read happens before write, state of program is the same. However, other threads will see different picture regarding order of produceCount increment and buffer modifications.

    Both compiler and CPU can do that transformation without memory fences, so you need to force those two operations to be in correct order.