Using atomics to reduce amount of mutex locks in multithreading c++ code

My multithreaded code has 3 different sections that can be described like this:

(1)unprotected by mutex: This part of code is only doing work on local and thread_local variables as well as global variables that are guaranteed to not change during multithreading.

(2)protected by shared_lock: This part can access but not modify a bunch of global datastructures

(3)protected by unique_lock: This part can access and modify(sometimes destructively) a bunch of global datastructures

I have only one global shared_mutex which is locked by the locks.

Everything is working fine but I noticed that I have a bunch of places looking like this

struct MyStruct {
  uint64_t counter;
  //some other fields
};

unordered_map<uint64_t, MyStruct> map;
shared_mutex mutex;

void someFunction(uint64_t id) {
  shared_lock lock(mutex);

  auto it = map.find(id);
  if(it == map.end()) return;
  MyStruct &s = it->second;
  
  uint64_t counterValue = s.counter;
  doSomeWork(counterValue);//old value of counter is used

  mutex.unlock_shared();
  mutex.lock();
  //if we don't find id in map, it means that s was deleted during lock transition
  //id is never reused for a different struct
  if(map.find(id) == map.end()) return;
  s.counter += 1;
  mutex.unlock();
  mutex.lock_shared();

  if(map.find(id) == map.end()) return;

  doSomeMoreWork();//counter is not used anymore
}

seems wasteful to do a bunch of relocks to increase a single counter, so I want to change counter to be atomic and have something like:

struct MyStruct {
  atomic<uint64_t> counter;
  //some other fields
};

unordered_map<uint64_t, MyStruct> map;
shared_mutex mutex;

void someFunction(uint64_t id) {
  shared_lock lock(mutex);

  auto it = map.find(id);
  if(it == map.end()) return;
  MyStruct &s = it->second;
  
  uint64_t counterValue = s.counter.fetch_add(1);
  doSomeWork(counterValue);//old value of counter is used
  doSomeMoreWork();//counter is not used anymore
}

I have no experience with atomics. Is this gonna work properly? Are there some potential problems to watch out for? Also there is a bunch of different memory orders I can provide for fetch_add:

memory_order_relaxed
memory_order_consume
memory_order_acquire
memory_order_release
memory_order_acq_rel
memory_order_seq_cst

what is the right one here?

Solution

Yes, you can use atomics for this purpose. In fact, you could even keep the old syntax of s.counter += 1 with a std::atomic<std::uint64_t>. However, this would use std::memory_order::seq_cst, which may be beyond your needs.

It's overall a better solution because the code is simpler and likely much faster. Note that many architectures (e.g. x86_64, ARM64) have hardware support for atomic operations. On x86_64, an atomic fetch-add is simply an add instruction with a lock prefix. This is much faster than locking a mutex.

Which memory order to choose

If the only thing you need is a counter which is incremented atomically, it's sufficient to use std::memory_order::relaxed. However, when in doubt, just go with seq_cst. Which memory order you need depends on what other operations are performed on the atomic. It's always about the big picture, you cannot decide it for one operation in isolation.

What about more complex operations

On a side note, since this has been mentioned in the comments, you can perform operations which are more complex than += using a Compare-and-swap (CAS) operation (in C++: compare_exchange_xxx).

For example, to perform if ((counter%5)==0) counter++; atomically:

std::uint64_t expected = counter;
while (expected % 5 == 0 && counter.compare_exchange_weak(expected, expected + 1)) {}