c++multithreading c++11 atomic lock-free

Multithreaded Atomic Store/Load of multiple values in C++

Suppose I have a structure and class in C++:

struct Vec {
   double x;
   double y;
   double z;
}

class VecTracker {
   Vec latest_vec;
   std::atomic<double> highest_x;
   std::atomic<double> highest_y;
   std::atomic<double> highest_z;

   //updates highest_x, highest_y, highest_z atomically
   void push_vec(const Vec& v);
   double get_high_x() const;
   double get_high_y() const;
   double get_high_z() const;
   //returns Vec consisting of snapshot of highest_x, highest_y, highest_z
   Vec get_highs() const;
}

I'll have R reader threads and one writer thread. The writer thread will update zero or more of the highest_* members. If the reader thread calls get_highs() I need all the writes from the current invocation of the writer thread's push_vec() function to be visible to the reader thread before the reader thread reads highest_x, highest_y, etc. to produce a vector.

Now, I know that if Vec is sufficiently small, I could just use a std::atomic<Vec>. Problem is, if it's too big, native CPU instructions for these store/loads can't be used. Is there any way to use std::atomic_thread_fence to guarantee that multiple atomic writes are committed by the writer thread before the reader thread picks them up? That is, a guarantee that all writes by the writer thread are committed before a reader thread sees any of them? Or does std::atomic_thread_fence only provide reordering guarantees within a thread? Currently, just using the .store(std::memory_order_release) for each member doesn't seem to guarantee that all three stores happen before any reads.

Obviously, I could use a lock here, but ideally I want to find a way to make this data structure lockfree.

I know that I could put highest_x, highest_y, and highest_z in a single struct and allocate two copies of it on the heap, swapping pointers atomically after each write. Is this the only way to do it?

Solution

The devil is here: //updates highest_x, highest_y, highest_z atomically. How do you guarantee that they are, indeed, atomic? Since 3 doubles do not fit into 16B (the largest atomic operation I know on X86_64 platform) the only way to ensure this would be to use mutex.

Your problem is not with the fence. By issuing the fence instruction, you will guarantee that all previous updates would be visible. What you can't guarantee, though, is that they would not be visible before this. As a result, you would be able to read the more recent value for one of the vector variables.

To solve your issue, you should either go with mutex - they are quite efficient when uncontended - or, if you are allergic to mutexes, pointer swap solution you described yourself.