Search code examples
linuxassemblyoperating-systemcpu-architecturefutex

Is test-and-set (or other atomic RMW operation) a privileged instruction on any architecture?


Hardware provides atomic instructions like test-and-set, compare-and-swap, load-linked-store-conditional. Are these privileged instructions? That is, can only the OS execute them (and thus requiring a system call)?

I thought they are not privileged and can be called in user space. But http://faculty.salina.k-state.edu/tim/ossg/IPC_sync/ts.html seems to suggest otherwise. But then, futex(7), under certain conditions, can achieve locking without a system call, which means it must execute the instruction (like test-and-set) without privilege.

Contradiction? If so, which is right?


Solution

  • That page is wrong. It seems to be claiming that lock-free atomic operations are privileged on ISAs in general, but that is not the case. I've never heard of one where atomic test-and-set or any other lock-free operation required kernel mode.

    If that was the case, it would require C++11 lock-free atomic read-modify-write operations to compile to system calls, but they don't on x86, ARM, AArch64, MIPS, PowerPC, or any other normal CPU. (try it on https://godbolt.org/).

    It would also make "light-weight" locking (which tries to take the lock without a system call) impossible. (http://preshing.com/20111124/always-use-a-lightweight-mutex/)

    Normal ISAs allow user-space to do atomic RMW operations, on memory shared between threads or even between separate processes. I'm not aware of a mechanism for disabling atomic RMW for user-space on x86. Even if there is such a thing on any ISA, it's not the normal mode of operation.

    Read-only or write-only accesses are usually naturally atomic on aligned locations on all ISAs, up to a certain width (Why is integer assignment on a naturally aligned variable atomic on x86?), but atomic RMW does need hardware support.


    On x86, TAS is xchg (a whole byte) or lock bts (a single bit), both unprivileged. (Documentation for the lock prefix). x86 has a wide selection of other atomic operations, like lock add [mem], reg/immediate, lock cmpxchg [mem], reg, and even lock xadd [mem], reg which implements fetch_add when the return value is needed. (See also Is incrementing an int effectively atomic in specific cases? for how hardware implements these, and the C++ rules.)

    Most RISCs have LL/SC, including ARM, MIPS, and PowerPC, as well as all the older no longer common RISC ISAs.


    futex(2) is a system call. If you call it, everything it does is in kernel mode.

    It's the fallback mechanism used by light-weight locking in case there is contention, which provides OS-assisted sleep/wake. So it's not futex itself that does anything in user-space, but rather lock implementations built around futex can avoid making system calls in the uncontended or low-contention case.

    (e.g. spin in user-space a few times in case the lock becomes available.)

    This is what the futex(7) man page is describing. But I find it a bit weird to call it a "futex operation" if you don't actually make the system call. I guess it's operating on memory that kernel code might look at on behalf of other threads that are waiting, so the necessary semantics for modifying memory locations in user-space code depends on futex.