Search code examples
c++cachingshared-memoryrace-conditionmesi

Why race condition occurs when hardware has ensured coherency


#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
#include <mutex>

using namespace std::chrono;

const int nthreads = 4;
const int64_t ndata = 1600000;

int total_sum = 0;

void compute_sum(int tid, std::vector<int>& d)
{
  int st = (ndata / nthreads) * tid;
  int en = (ndata / nthreads) * (tid + 1);
  for (int i = st; i < en; i++) {
    total_sum += d[i];
  }
}

int main(int argc, char ** arg)
{
  std::vector<std::thread> threads;
  
  std::vector<int> data;
  for (int i = 0; i < ndata; i++) {
    data.push_back(i);
  }

  for (int i = 0; i < nthreads; i++) {
    threads.push_back(std::thread(compute_sum, i, std::ref(data)));
  }
  
  for (auto &th : threads) {
    th.join();
  }
  return 0;
}

Here total_sum is shared between two threads, thus race condition occurs and this code is messed up (until atomics or locks are used).
But when hardware level coherency is implemented like MESI protocol, shouldn't the shared memory be handled properly by the hardware itself such that no race condition occurs?


Solution

  • The MESI protocol (Wikipedia) is a mechanism which improves the performance of maintaining the coherence of a CPU Cache.

    A CPU Cache (Wikipedia) is a hardware component which lies between the CPU and the main memory of a computer.

    The coherence of the CPU cache is completely unrelated to the thread-safety of:

    • your code, or
    • collections such as std::vector<> that your code might use, or
    • anything whatsoever that you might do in C++, or
    • anything whatsoever that you might even do in machine code, which is what C++ compiles to.

    In other words, one has absolutely nothing to do with the other.

    Knowledge of the existence of the CPU cache and of its inner workings is only helpful to expert programmers as a means of writing code that will perform more optimally by taking into account peculiarities about the way the cache works. (For example, when iterating over all elements of a two-dimensional array declared as Array[Y][X], you will usually get better performance if your outer loop iterates along Y, and your inner loop iterates along X.)