c++multithreading parallel-processing openmp mutex

How do I avoid large number of locks?

Let arr be declared as follows.

std::vector<std::vector<int>> arr(10,000,000);

My code in serial is something like

for (const auto [x, y] : XY) {
  arr[x].push_back(y);
}

I use openmp and define an array of locks as follows.

std::vector<omp_lock_t> locks(10,000,000);

With locks, I use

#pragma omp parallel for schedule(dynamic)
for (const auto [x, y] : XY) {
  omp_set_lock(&locks[x]);
  arr[x].push_back(y);
  omp_unset_lock(&locks[x]);
}

Such an approach works in my machine (windows linux subsystem). But then I find the following post

c/c++ maximum number of mutexes allowed in Linux

which raises the concern whether I have used too many locks, and my program may not work for other platforms (those with a limit on the number of locks allowed).

I wonder if there is another way in which I still have the same control granularity as above and it does not have upper limit on the number allowed. Can I use something like compare and swap?

Solution

According to the OpenMP documentation, omp_lock_t represents "a simple lock". I suppose it is some kind of a spinlock. You, therefore, don't need to care about limits for a number of mutexes. (Mutexes require some interaction with a kernel / scheduler, which is likely a reason for restricting their count.)

omp_lock_t uses a single memory location, which is fine for a spinlock. For instance, this live demo shows that omp_lock_t takes only 4 bytes, while std::mutex 40 bytes: https://godbolt.org/z/sr845h8hx.

Note that a spinlock can be implemented with a single byte, even with a single bit (BTS on x86_64). Therefore, you can compress your locks even much more in memory. However, I'd be careful with such an approach since it can bring significant false sharing issues.

I think your solution is pretty fine. Since the operation in the critical section should be very fast, and there is likely a low probability that multiple threads will access the same element at the same moment, a spinlock is a suitable solution in my opinion.

EDIT

As pointed out in comments, the term simple lock may not mean that the lock is actually a spinlock. It's up to the OpenMP implementation to decide which type of lock it will use. Then, if you want to be sure that you are using a true spinlock (which I still believe is suitable), you can either write your own or use some library that provides one. (Note that the efficient spinlock implementation is not that simple. For instance, it needs to spin on a load/read operation instead of on exchange/test-and-set, which often naive implementations do.)