windows winapi system mutex critical-section

Does Mutex call a system call?

CRITICAL_SECTION locking (enter) and unlocking (leave) are efficient because CS testing is performed in user space without making the kernel system call that a mutex makes. Unlocking is performed entirely in user space, whereas ReleaseMutex requires a system call.

I just read these sentences in this book.
What the kernel system call mean? Could you give me the function's name?

I'm a English newbie. I interpreted them like this.

CS testing doesn't use a system call.
Mutex testing uses a system call.(But I don't know the function name. Let me know)
CS unlocking doesn't call a system call.
Mutex unlocking requires a system call.(But I don't know the function name. Let me know)

Another question.

I think CRITICAL_SECTION might call WaitForSingleObject or family functions. Don't these functions require a system call? I guess they do. So CS testing doesn't use a system call is very weird to me.

Solution

The implementation of critical sections in Windows has changed over the years, but it has always been a combination of user-mode and kernel calls.

The CRITICAL_SECTION is a structure that contains a user-mode updated values, a handle to a kernel-mode object - EVENT or something like that, and debug information.

EnterCriticalSection uses an interlocked test-and-set operation to acquire the lock. If successful, this is all that is required (almost, it also updates the owner thread). If the test-and-set operation fails to aquire, a longer path is used which usually requires waiting on a kernel object with WaitForSignleObject. If you initialized with InitializeCriticalSectionAndSpinCount then EnterCriticalSection may spin an retry to acquire using interlocked operation in user-mode.

Below is a diassembly of the "fast" / uncontended path of EnterCriticialSection in Windows 7 (64-bit) with some comments inline

0:000> u rtlentercriticalsection rtlentercriticalsection+35
ntdll!RtlEnterCriticalSection:
00000000`77ae2fc0 fff3            push    rbx
00000000`77ae2fc2 4883ec20        sub     rsp,20h
; RCX points to the critical section rcx+8 is the LockCount
00000000`77ae2fc6 f00fba710800    lock btr dword ptr [rcx+8],0
00000000`77ae2fcc 488bd9          mov     rbx,rcx
00000000`77ae2fcf 0f83e9b1ffff    jae     ntdll!RtlEnterCriticalSection+0x31 (00000000`77ade1be)
; got the critical section - update the owner thread and recursion count
00000000`77ae2fd5 65488b042530000000 mov   rax,qword ptr gs:[30h]
00000000`77ae2fde 488b4848        mov     rcx,qword ptr [rax+48h]
00000000`77ae2fe2 c7430c01000000  mov     dword ptr [rbx+0Ch],1
00000000`77ae2fe9 33c0            xor     eax,eax
00000000`77ae2feb 48894b10        mov     qword ptr [rbx+10h],rcx
00000000`77ae2fef 4883c420        add     rsp,20h
00000000`77ae2ff3 5b              pop     rbx
00000000`77ae2ff4 c3              ret

So the bottom line is that if the thread does not need to block it will not use a system call, just an interlocked test-and-set operation. If blocking is required, there will be a system call. The release path also uses an interlocked test-and-set and may require a system call if other threads are blocked.

Compare this to Mutex which always requires a system call NtWaitForSingleObject and NtReleaseMutant