I understand so far that thread start
is defined from a point of view, eg., in Windows, it's always ntdll.RtlUserThreadStart+21
(User) but at the program library level, it can be any function. But the thread start
is not called before the thread is created ntdll.NtCreateThreadEx+14
(System).
The thread entry
is the (library ie., exported, or private) function given as argument to the thread create
function.
An example of a callstack with threads (threadID, Address, to, from, size, comment, party) made with x64dbg:
4200
00000076EBDFF9A8 00007FFEC900A34E 00007FFECB4EC034 A0 ntdll.NtWaitForSingleObject+14 System
00000076EBDFFA48 00007FF7987B48A1 00007FFEC900A34E 30 kernelbase.WaitForSingleObjectEx+8E User
00000076EBDFFA78 00007FF7988961A0 00007FF7987B48A1 30 mylibrarydll0.00007FF7987B48A1 User
00000076EBDFFAA8 00007FF7987B13DF 00007FF7988961A0 30 mylibrarydll0.00007FF7988961A0 User
00000076EBDFFAD8 00007FF798B4A175 00007FF7987B13DF 30 mylibrarydll0.00007FF7987B13DF User
00000076EBDFFB08 00007FFECA637034 00007FF798B4A175 30 mylibrarydll0.sub_7FF798B4A0B4+C1 System
00000076EBDFFB38 00007FFECB49D0D1 00007FFECA637034 80 kernel32.BaseThreadInitThunk+14 System
00000076EBDFFBB8 0000000000000000 00007FFECB49D0D1 ntdll.RtlUserThreadStart+21 User
2736
00000076EB5FF648 00007FFECB4623D7 00007FFECB4EFA04 300 ntdll.NtWaitForWorkViaWorkerFactory+14 System
00000076EB5FF948 00007FFECA637034 00007FFECB4623D7 30 ntdll.TppWorkerThread+2F7 System
00000076EB5FF978 00007FFECB49D0D1 00007FFECA637034 80 kernel32.BaseThreadInitThunk+14 System
00000076EB5FF9F8 0000000000000000 00007FFECB49D0D1 ntdll.RtlUserThreadStart+21 User
2468
00000076EBBFFB78 00007FFEC900A34E 00007FFECB4EC034 A0 ntdll.NtWaitForSingleObject+14 System
00000076EBBFFC18 00007FF7987B48A1 00007FFEC900A34E 30 kernelbase.WaitForSingleObjectEx+8E User
00000076EBBFFC48 00007FF7988961A0 00007FF7987B48A1 30 mylibrarydll0.00007FF7987B48A1 User
00000076EBBFFC78 00007FF7987B13DF 00007FF7988961A0 30 mylibrarydll0.00007FF7988961A0 User
00000076EBBFFCA8 00007FF798B4A175 00007FF7987B13DF 30 mylibrarydll0.00007FF7987B13DF User
00000076EBBFFCD8 00007FFECA637034 00007FF798B4A175 30 mylibrarydll0.sub_7FF798B4A0B4+C1 System
00000076EBBFFD08 00007FFECB49D0D1 00007FFECA637034 80 kernel32.BaseThreadInitThunk+14 System
00000076EBBFFD88 0000000000000000 00007FFECB49D0D1 ntdll.RtlUserThreadStart+21 User
3784
00000076EB6FFB88 00007FFECB4623D7 00007FFECB4EFA04 300 ntdll.NtWaitForWorkViaWorkerFactory+14 System
00000076EB6FFE88 00007FFECA637034 00007FFECB4623D7 30 ntdll.TppWorkerThread+2F7 System
00000076EB6FFEB8 00007FFECB49D0D1 00007FFECA637034 80 kernel32.BaseThreadInitThunk+14 System
00000076EB6FFF38 0000000000000000 00007FFECB49D0D1 ntdll.RtlUserThreadStart+21 User
1928
00000076EB7FFA48 00007FFEC900A34E 00007FFECB4EC034 A0 ntdll.NtWaitForSingleObject+14 System
00000076EB7FFAE8 00007FF7987B48A1 00007FFEC900A34E 30 kernelbase.WaitForSingleObjectEx+8E User
00000076EB7FFB18 00007FF7988961A0 00007FF7987B48A1 30 mylibrarydll0.00007FF7987B48A1 User
00000076EB7FFB48 00007FF7987B13DF 00007FF7988961A0 30 mylibrarydll0.00007FF7988961A0 User
00000076EB7FFB78 00007FF798B4A175 00007FF7987B13DF 30 mylibrarydll0.00007FF7987B13DF User
00000076EB7FFBA8 00007FFECA637034 00007FF798B4A175 30 mylibrarydll0.sub_7FF798B4A0B4+C1 System
00000076EB7FFBD8 00007FFECB49D0D1 00007FFECA637034 80 kernel32.BaseThreadInitThunk+14 System
00000076EB7FFC58 0000000000000000 00007FFECB49D0D1 ntdll.RtlUserThreadStart+21 User
2276
00000076EB8FF7C8 00007FFEC900A34E 00007FFECB4EC034 A0 ntdll.NtWaitForSingleObject+14 System
00000076EB8FF868 00007FF7987B48A1 00007FFEC900A34E 30 kernelbase.WaitForSingleObjectEx+8E User
00000076EB8FF898 00007FF7988961A0 00007FF7987B48A1 30 mylibrarydll0.00007FF7987B48A1 User
00000076EB8FF8C8 00007FF7987B13DF 00007FF7988961A0 30 mylibrarydll0.00007FF7988961A0 User
00000076EB8FF8F8 00007FF798B4A175 00007FF7987B13DF 30 mylibrarydll0.00007FF7987B13DF User
00000076EB8FF928 00007FFECA637034 00007FF798B4A175 30 mylibrarydll0.sub_7FF798B4A0B4+C1 System
00000076EB8FF958 00007FFECB49D0D1 00007FFECA637034 80 kernel32.BaseThreadInitThunk+14 System
00000076EB8FF9D8 0000000000000000 00007FFECB49D0D1 ntdll.RtlUserThreadStart+21 User
12168
00000076EB9FF6E8 00007FFECB4623D7 00007FFECB4EFA04 300 ntdll.NtWaitForWorkViaWorkerFactory+14 System
00000076EB9FF9E8 00007FFECA637034 00007FFECB4623D7 30 ntdll.TppWorkerThread+2F7 System
00000076EB9FFA18 00007FFECB49D0D1 00007FFECA637034 80 kernel32.BaseThreadInitThunk+14 System
00000076EB9FFA98 0000000000000000 00007FFECB49D0D1 ntdll.RtlUserThreadStart+21 User
2428
00000076EBAFF5D8 00007FFECB4623D7 00007FFECB4EFA04 300 ntdll.NtWaitForWorkViaWorkerFactory+14 System
00000076EBAFF8D8 00007FFECA637034 00007FFECB4623D7 30 ntdll.TppWorkerThread+2F7 System
00000076EBAFF908 00007FFECB49D0D1 00007FFECA637034 80 kernel32.BaseThreadInitThunk+14 System
00000076EBAFF988 0000000000000000 00007FFECB49D0D1 ntdll.RtlUserThreadStart+21 User
Windows sends the debugger a specific set of events, you can find them in the documentation of WaitForDebugEvent.
One of these events is CREATE_THREAD_DEBUG_INFO
, which is sent when Windows has created but not yet started the thread.
In Windows, process and thread creation happens in the kernel but their final initialization steps happen in userspace (unless it's a picoprocess, which we won't address here). The DLL ntdll.dll
is mapped in the thread just after it's been created and the thread context's RIP
is set to point to one of this DLL's functions.
This function will perform the necessary initializations and then jump to the address given in CreateThread
or similar. This function is kind of a wrapper for threads.
It is quite granted that thread start happens when the first instruction of the initialization function is about to execute (think of it as if Windows had set a breakpoint there).
The thread entry is, instead, just the address given to the thread creation API. It is important because it is the actual code the caller intended to be executed. In fact, for debugging or RE purposes, you can almost (if not always) ignore the thread start event.
Let's do an example. Consider this simple 64-bit program.
BITS 64
EXTERN CreateThread
GLOBAL _start
SECTION .text
_start:
and rsp, -16
push 0
push 0
sub rsp, 20h
xor r9, r9
lea r8, [REL _thread1]
xor edx, edx
xor ecx, ecx
call CreateThread
.loop:
TIMES 1000 pause
jmp .loop
_thread1:
TIMES 1000 pause
jmp _thread1
All it does is create a thread pointing to a sled of pause
instructions executed in a loop. The main thread will also execute a similar, but different, loop.
The purpose of the loop is to have the RIP
of the threads change but still not being inside a Windows API. Any instruction in the loop, granted it doesn't fault, will be fine. I picked pause
, because :)
Assemble and link the program.
Open x64dbg, open the program, and then set the Thread start and Thread entry events.
Now press F9 to reach the program entry point and press F9 again to let it go. The debugger will be notified of the new thread creation.
Note that the execution stopped at the beginning of RtlUserThreadStart
. This is always the case for my version of Windows (Windows 7 something). It makes sense, given the introduction at the beginning of this answer.
Also note that the thread entry point is in rcx
, meaning it is the first parameter for RtlUserThreadStart
.
Now, this was the event that Windows sent to the debugger, so it's natural the execution stopped here.
But the thread entry event doesn't exist, what is x64dbg doing here?
You can unveil this mystery by looking at the breakpoint tab.
You see that the debugger set a one-time (i.e. it will be removed automatically by the debugger itself) breakpoint at the thread entry point.
So, while Windows doesn't offer support for generating a debug event when a thread first starts executing its user code, a debugger can emulate it easily by putting a breakpoint there before the thread actually start.
Note that this means the debugger always react to the thread start events, when disabled in the options it will simply not stop, show and wait for you to do something.
Pausing and resuming the thread doesn't change the thread entry point, which is fixed at thread creation.
x64dbg has a threads tab that allows the user to suspend and resume the threads. Playing with it doesn't change the thread entry point, just the RIP
s that still point somewhere in the two loops (that exists for easing this test).
If the thread is created with the suspend flag, the thread start event won't fire until the thread is resumed.
But if, before resuming the thread, a pair of calls to Get/SetThreadContext
is done to change the thread's RIP
, then RtlUserStartThread
will never be executed (IDK what this function does exactly, but a thread can do without it) and the thread start event will never fire.
The execution will go straight to the altered RIP
.
I'm not sure if this is a legacy bug of Windows' debugging interface, the thread start event could be generated by setting the TF
before the first schedule of the thread (and immediately removing it upon catching the relevant exception).
When debugging/REing thread, what I usually do is putting a breakpoint in the thread entry point (which is easy to get) or in the hijacked RIP
(which is also easy to get, since this kind of threads are created suspended, so you know something is fishy).
If the program is being nasty and the code at the thread's RIP
is not yet in clear (e.g. is still obfuscated), use a hardware breakpoint.
Note This same whole thing happens for process creation too, exactly the same (only with the PE entry point instead of a thread entry point).