Search code examples
c++catomicstdatomic

Best way to atomically bitwise AND a byte in C/C++?


Currently looking at atomic operations in C/C++ using GCC and discovered that naturally aligned global variables in memory have atomic reads and writes.

However, I was trying to bitwise AND a global variable and noticed it boils down to a read-modify-write sequence which is troublesome if there are multiple threads operating on that byte value.

After some research, I've settled on these two examples:

C Example - GCC extension __sync_fetch_and_and

#include <stdio.h>
#include <stdint.h>

uint8_t byteC = 0xFF;

int main() {
    __sync_fetch_and_and(&byteC, 0xF0);
    printf("Value of byteC: 0x%X\n", byteC);
    return 0;
}

C++ Example - C++11 using atomic fetch_and

#include <iostream>
#include <atomic>

std::atomic<uint8_t> byteCpp(0xFF);

int main() {
    byteCpp.fetch_and(0xF0);
    std::cout << "Value of byteCpp: 0x" << std::hex << static_cast<int>(byteCpp.load()) << std::endl;
    return 0;
}

Other examples follow but they seemed less intuitive and more computationally expensive.

Using a pthread_mutex_lock

uint8_t byte = 0xFF;
pthread_mutex_t byte_mutex = PTHREAD_MUTEX_INITIALIZER;

pthread_mutex_lock(&byte_mutex);
byte &= 0xF0;
pthread_mutex_unlock(&byte_mutex);

Using a mutex lock_guard

#include <mutex>

uint8_t byte;
std::mutex byte_mutex;

void atomic_and() {
    std::lock_guard<std::mutex> lock(byte_mutex);
    byte &= 0xF0;
}

Using a compare_exchange_weak

std::atomic<uint8_t> byte;

void atomic_and() {
    uint8_t old_val, new_val;
    do {
        old_val = byte.load();
        new_val = old_val & 0xF0;
    } while (!byte.compare_exchange_weak(old_val, new_val));
}

Question

What's the best atomic method for a read-modify-write sequence in a multithreaded C/C++ program?


Solution

  • [I have] discovered that naturally aligned global variables in memory have atomic reads and writes.

    This is not correct in a C/C++ sense, only in an x86_64 sense. It is true that any aligned loads and stores on x86_64 are atomic, but that isn't correct for the abstract machine. Writing to a non-atomic bit of memory concurrently is always a data race, and thread sanitizers might catch the mistake, even if the architecture theoretically makes it safe.

    Furthermore, the best way to do byte &= 0xf0 atomically is very similar in C and C++:

    // C++
    #include <atomic>
    std::atomic_uint8_t byte; // or std::atomic<std::uint8_t>
    // ...
    std::uint8_t old = byte.fetch_and(0xf0); /* optionally specify memory order */
    // or
    std::uint8_t old = std::atomic_fetch_and(&byte, 0xf0);
    
    // C (no compiler extensions/intrinsics needed)
    #include <stdatomic.h>
    atomic_uint8_t byte; // or _Atomic uint8_t
    // ...
    uint8_t old = atomic_fetch_and(&byte, 0xf0); /* optionally atomic_fetch_and_explicit */
    

    The other methods (POSIX threads, std::mutex, compare_exchange retry loop) are almost certainly worse than the built-in way in the form of fetch_and functions. If the architecture doesn't directly provide an atomic fetch-AND instruction, then whichever way is best should be chosen. It's not something you have to worry about.


    See Also

    Thanks to @PeterCordes for sharing these links.