Is there a performance penalty with memory_order_relaxed?

On typical x86 and ARM systems, is there ever any performance penalty for making a variable std::atomic and only performing std::memory_order_relaxed operations? (Compared to making it a normal value and doing non-atomic operations.)

I'm aware that stronger memory ordering guarantees may result in performance penalties; I am specifically wondering if std::memory_order_relaxed imposes any penalty.

Also, let's assume the operations are being done on objects that satisfy std::is_always_lock_free. Otherwise, there'd surely be a penalty due to the required mutex.

Solution

Yes, there is generally a performance penalty. Not necessarily because of the underlying CPU architecture, but because of semantic constraints use of atomics imposes for the compiler.

If you replace an atomic variable with relaxed semantics with a non-atomic variable then this either doesn't affect the permitted observable behaviors of the program or it widens the set of permissible behaviors (e.g. because code paths become UB due to data races being introduced).

So, with the relaxed atomics the compiler is more constrained in what optimizations it can apply because it must assure a narrower set of observable behaviors.

The list of affected optimizations is long, as some comments under the question already hint at.

And additionally, as mentioned in the question comments, some relaxed operations, e.g. RMW, still differ from their non-atomic equivalents even on the ISA level for x86 and ARM.