Search code examples
c++debuggingassemblywinapivisual-c++

Mysterious behavior with NtWaitForSingleObject in C++ - returns invalid handle error without std::cout


I've run into a strange issue with my program that uses NtWaitForSingleObject and NtDelayExecution in a loop. The problem is that the function NtWaitForSingleObject occasionally returns the error 0xC0000008 (STATUS_INVALID_HANDLE), but only when I remove std::cout statements from my code. This behavior is driving me crazy, and I can't figure out what's wrong.

Here’s what’s happening:

If I have two std::cout statements after the system calls (NtDelayExecution_Syscall and NtWaitForSingleObject_Syscall), everything works as expected. If I remove the std::cout statements (or leave just one of them), NtWaitForSingleObject_Syscall returns 0xC0000008 (invalid handle). I've tested the values in registers and variables, and they seem correct before calling NtWaitForSingleObject. The handle passed to the function is the result of GetCurrentProcess(), which should be valid.

Here’s my code:

Assembly code (.asm):

.code 

; func NtDelayExecution
NtDelayExecution_Syscall proc
    mov rax, 34h
    syscall
    ret
NtDelayExecution_Syscall endp

; func NtWaitForSingleObject
NtWaitForSingleObject_Syscall proc
    mov rax, 04h
    syscall
    ret
NtWaitForSingleObject_Syscall endp
end

C++ code (.cpp):

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <iostream>

extern "C" LONG NtDelayExecution_Syscall(
    BOOLEAN Alertable,
    PLARGE_INTEGER DelayInterval
);

extern "C" LONG NtWaitForSingleObject_Syscall(
    HANDLE hProcess,
    BOOLEAN Alertable,
    PLARGE_INTEGER DelayInterval
);

void StartMonitor(HANDLE hProcessToMonitor) {
    // Установка нулевого таймаута
    LARGE_INTEGER integer;
    integer.QuadPart = -10000 * 1000;
    LARGE_INTEGER timeout;
    timeout.QuadPart = 0;
    LONG result;
    while (true) {
        result = NtDelayExecution_Syscall(FALSE, &integer);
        std::cout << result << std::endl;  // PROBLEM: Without this line, the error appears
        result = NtWaitForSingleObject_Syscall(hProcessToMonitor, FALSE, &timeout);
        std::cout << result << std::endl;  // PROBLEM: Without this line, the error appears
    }
    return;
}

int main() {
    HANDLE hProcess = GetCurrentProcess();  // Using GetCurrentProcess() handle
    StartMonitor(hProcess);
    return 0;
}

What I've Tried:

Disabling optimizations: I tried disabling compiler optimizations (/Od in MSVC), but the behavior doesn’t change.

Buffering std::cout: Tried disabling synchronization with C stdio using std::ios_base::sync_with_stdio(false), no change.

Added artificial variables and padding to stack: I added variables like volatile int padding[10]; to see if it’s a stack issue — no effect.

Tried using alignas(16) on variables: Didn't help.

Checked registers in the debugger: When the error occurs, RCX (which holds the handle) is FFFFFFFFFFFFFFFF.

My Questions:

Why would the presence of std::cout affect the outcome of NtWaitForSingleObject_Syscall?

How does the output stream influence the behavior of these system calls?

What other debugging steps can I take to isolate the root cause?

Could this be related to memory alignment, stack management, or something specific to MSVC?

Are there known issues with NtWaitForSingleObject handling process handles from GetCurrentProcess() in low-level syscalls?

I've spent hours trying to diagnose this, and any insights would be greatly appreciated!


Solution

  • if look for actual Zw/Nt api implementation in ntdll.dll visible that all it begin from mov r10,rcx instruction. so first argument moved to r10 register. if, under debugger, special change r10 in NtWaitForSingleObject (after mov r10,rcx) say to 0, we got exactly 0xC0000008 (invalid handle).

    so error was in wrong implementation of NtWaitForSingleObject (and NtDelayExecution). in r10 was random value, which affected by std::cout << result << std::endl;

    unclear for what at all need implement it by self Zw/Nt, instead import it from ntdll.dll, and even if do this, not need hardcode SSN numbers (which is can be different in different windows versions) but get it in runtime (if build tables of all Zw exports in ntdll.dll and sort it by functions address, the index of Zw entry in this sorted by address table will be exactly SSN)