std::mutex is implemented with critical sections, which is why it's much faster than OS Mutex (on Windows). However it's not as fast as a Windows CRITICAL_SECTION.
Timings just a tight loop in a single thread:
423.76ns ATL CMutex
41.74ns std::mutex
16.61ns win32 Critical Section
My question is what else is std::mutex doing? I looked at the source but couldn't follow it. However there were extra steps before it defers to the Crit Sec. My questions is: are these extra steps accomplishing useful? That is, what are the extra steps for; what would I miss out on by using CRITICAL_SECTION?
Also why did they call it Mutex if it's not implemented with a Mutex?
A std::mutex provides non-recursive ownership semantics. A CRITICAL_SECTION provides recursive semantics. So I assume the extra layer in the std::mutex implementation is (at least in part) to resolve this difference.
Update: Stepping through the code, it looks like std::mutex is implemented in terms of a queue and InterlockedX instructions rather than a classical Win32 CRITICAL_SECTION. Even though std::mutex is non-recursive, the underlying code in the RTL can optionally handle recursive and even timed locks.