Best way to atomically bitwise AND a byte in C/C++?

Currently looking at atomic operations in C/C++ using GCC and discovered that naturally aligned global variables in memory have atomic reads and writes.

However, I was trying to bitwise AND a global variable and noticed it boils down to a read-modify-write sequence which is troublesome if there are multiple threads operating on that byte value.

After some research, I've settled on these two examples:

C Example - GCC extension __sync_fetch_and_and

#include <stdio.h>
#include <stdint.h>

uint8_t byteC = 0xFF;

int main() {
    __sync_fetch_and_and(&byteC, 0xF0);
    printf("Value of byteC: 0x%X\n", byteC);
    return 0;
}

C++ Example - C++11 using atomic fetch_and

#include <iostream>
#include <atomic>

std::atomic<uint8_t> byteCpp(0xFF);

int main() {
    byteCpp.fetch_and(0xF0);
    std::cout << "Value of byteCpp: 0x" << std::hex << static_cast<int>(byteCpp.load()) << std::endl;
    return 0;
}

Other examples follow but they seemed less intuitive and more computationally expensive.

Using a pthread_mutex_lock

uint8_t byte = 0xFF;
pthread_mutex_t byte_mutex = PTHREAD_MUTEX_INITIALIZER;

pthread_mutex_lock(&byte_mutex);
byte &= 0xF0;
pthread_mutex_unlock(&byte_mutex);

Using a mutex lock_guard

#include <mutex>

uint8_t byte;
std::mutex byte_mutex;

void atomic_and() {
    std::lock_guard<std::mutex> lock(byte_mutex);
    byte &= 0xF0;
}

Using a compare_exchange_weak

std::atomic<uint8_t> byte;

void atomic_and() {
    uint8_t old_val, new_val;
    do {
        old_val = byte.load();
        new_val = old_val & 0xF0;
    } while (!byte.compare_exchange_weak(old_val, new_val));
}

Question

What's the best atomic method for a read-modify-write sequence in a multithreaded C/C++ program?

Solution

[I have] discovered that naturally aligned global variables in memory have atomic reads and writes.

This is not correct in a C/C++ sense, only in an x86_64 sense. It is true that any aligned loads and stores on x86_64 are atomic, but that isn't correct for the abstract machine. Writing to a non-atomic bit of memory concurrently is always a data race, and thread sanitizers might catch the mistake, even if the architecture theoretically makes it safe.

Furthermore, the best way to do byte &= 0xf0 atomically is very similar in C and C++:

// C++
#include <atomic>
std::atomic_uint8_t byte; // or std::atomic<std::uint8_t>
// ...
std::uint8_t old = byte.fetch_and(0xf0); /* optionally specify memory order */
// or
std::uint8_t old = std::atomic_fetch_and(&byte, 0xf0);

// C (no compiler extensions/intrinsics needed)
#include <stdatomic.h>
atomic_uint8_t byte; // or _Atomic uint8_t
// ...
uint8_t old = atomic_fetch_and(&byte, 0xf0); /* optionally atomic_fetch_and_explicit */

The other methods (POSIX threads, std::mutex, compare_exchange retry loop) are almost certainly worse than the built-in way in the form of fetch_and functions. If the architecture doesn't directly provide an atomic fetch-AND instruction, then whichever way is best should be chosen. It's not something you have to worry about.

Best way to atomically bitwise AND a byte in C/C++?

See Also