Search code examples
cloopswinapix86stack-overflow

Program with while loop causes stack overflow, but only in x86 and only when injected into another process


I have an unfortunately convoluted problem that I am hopeful someone might be able to help me with.

I have written a reasonably large program that I have converted into position independent code (see here for reference: https://bruteratel.com/research/feature-update/2021/01/30/OBJEXEC/). Basically just meaning that the resulting exe (compiled using mingw) contains data only in the .text section, and thus can be injected into and ran from an arbitrary place in memory. I have successfully ported the program to this format and can compile it for both x86 and x64.

I created two "helper" exe's to run the PIC program, a local injector and a remote injector. The local injector runs the program by calling VirtualAlloc, memcpy, and CreateThread. The remote injector runs the program by calling CreateProcess (suspended), VirtualAllocEx, WriteProcessMemory, QueueAPCThread, and ResumeThread (the last two api's being called on pi.hThread which was returned from CreateProcess).

I am experiencing inconsistent results in the program depending on the architecture and method of execution.

x64 local: works
x64 inject: works
x86 local: works
x86 inject: fails; stack overflow

I have determined that my program is crashing in a while loop in a particular function. This function is used to format data contained in buffers (heap allocated) that are passed in as function args. The raw data buffer (IOBuf) contains a ~325k long string containing Base64 characters with spaces randomly placed throughout. The while loop in question iterates over this buffer and copies non-space characters to a second buffer (IntermedBuf), with the end goal being that IntermedBuf contains the full Base64 string in IOBuf minus the random spaces.

A few notes about the following code snippet:

  1. Because the code is written to be position independent, all api's must be manually resolved which is why you see things like (SPRINTF)(Apis.sprintfFunc). I have resolved the addresses of each API in their respective DLL and have created typedef's for each API that is called. While odd, this is not in itself causing the issue as the code works fine in 3/4 of the situations.

  2. Because this program is failing when injected, I cannot use print statements to debug, so I have added calls to MessageBoxA to pop up at certain places to determine contents of variables and/or if execution is reaching that part of the code.

The relevant code snippet is as follows:

        char inter[] = {'I','n','t',' ',0};
        char tools[100] = {0};
        if (((STRCMP)Apis.strcmpFunc)(IntermedBuf, StringVars->b64Null) != 0)
        {
            int i = 0, j = 0, strLen = 0, lenIOBuf = ((STRLEN)Apis.strlenFunc)(IOBuf);
            ((SPRINTF)Apis.sprintfFunc)(tools, StringVars->poi, IOBuf);
            ((MESSAGEBOXA)Apis.MessageBoxAFunc)(NULL, tools, NULL, NULL);
            ((MEMSET)Apis.memsetFunc)(tools, 0, 100 * sizeof(char));
            ((SPRINTF)Apis.sprintfFunc)(tools, StringVars->poi, IntermedBuf);
            ((MESSAGEBOXA)Apis.MessageBoxAFunc)(NULL, tools, NULL, NULL);
            
            char* locSpace;
            while (j < lenIOBuf)
            {
                locSpace = ((STRSTR)Apis.strstrFunc)(IOBuf + j, StringVars->space);
                if (locSpace == 0)
                    locSpace = IOBuf + lenIOBuf;

                strLen = locSpace - IOBuf - j;

                ((MEMCPY)Apis.memcpyFunc)(IntermedBuf + i, IOBuf + j, strLen);
                i += strLen, j += strLen + 1;
            }
            ((MESSAGEBOXA)Apis.MessageBoxAFunc)(NULL, StringVars->here, NULL, NULL);
            ((MEMSET)Apis.memsetFunc)(IOBuf, 0, BUFFSIZE * sizeof(char));  

The first two MessageBoxA calls successfully execute, each containing the address of IOBuf and IntermedBuf respectively. The last call to MessageBoxA, after the while loop, never comes, meaning the program is crashing in the while loop as it copies data from IOBuf to IntermedBuf.

I ran remote.exe which spawned a new WerFault.exe (I have tried with calc, notepad, several other processes with the same result) containing the PIC program, and stuck it into Windbg to try and get a better sense of what was happening. I found that after receiving the first two message boxes and clicking through them, WerFault crashes with a stack overflow caused by a call to strstr:

stackoverflow

Examining the contents of the stack at crash time shows this:

stack contents

Looking at the contents of IntermedBuf (which is one of the arguments passed to the strstr call) I can see that the program IS copying data from IOBuf to IntermedBuf and removing spaces as intended, however the program crashes after copying ~80k.

IOBuf (raw data):

IOBuf

IntermedBuf(After removing spaces)

IntermedBuf

My preliminary understanding of what is happening here is that strstr (and potentially memcpy) are pushing data to the stack with each call, and given the length of the loop (lengthIOBuf is ~325K, spaces occur randomly every 2-11 characters throught) the stack is overflowing before the while loop finishes and the stack unwinds. However this doesn't explain why this succeeds in x64 in both cases, and in x86 when the PIC program is running in a user-made program as opposed to injected into a legitimate process.

I have ran the x86 PIC program in the local injector, where it succeeds, and also attached Windbg to it in order to examine what is happening differently there. The stack similarly contains the same sort of pattern of characters as seen in the above screenshot, however later in the loop (because again the program succeeds), the stack appears to... jump? I examined the contents of the stack early into the while loop (having set bp on strstr) and see that it contains much the same pattern seen in the stack in the remote injector session:

localStack

I also added another MessageBox this time inside the while loop, set to pop when j > lenIOBuf - 500 in order to catch the program as it neared completion of the while loop.

            char* locSpace;
            while (j < lenIOBuf)
            {
                if (j > lenIOBuf - 500)
                {
                    ((MEMSET)Apis.memsetFunc)(tools, 0, 100 * sizeof(char));
                    ((SPRINTF)Apis.sprintfFunc)(tools, StringVars->poi, IntermedBuf);
                    ((MESSAGEBOXA)Apis.MessageBoxAFunc)(NULL, tools, NULL, NULL);
                }
                locSpace = ((STRSTR)Apis.strstrFunc)(IOBuf + j, StringVars->space);
                if (locSpace == 0)
                    locSpace = IOBuf + lenIOBuf;

                strLen = locSpace - IOBuf - j;

                ((MEMCPY)Apis.memcpyFunc)(IntermedBuf + i, IOBuf + j, strLen);
                i += strLen, j += strLen + 1;
            }

When this MessageBox popped, I paused execution and found that ESP was now 649fd80; previously it was around 13beb24? bpMessageBox

So it appears that the stack relocated, or the local injector added more memory to the stack or something (I am embarassingly naive about this stuff). Looking at the "original" stack location at this stage in execution shows that the data there previously is still there at this point when the loop is near completion:

lateStack

So bottom line, this code which runs successfully by all accounts in x64 local/remote and x86 local is crashing when ran in another process in x86. It appears that in the local injector case the stack fills in a similar fashion as in the remote injector where it crashes, however the local injector is relocating the stack or adding more stack space or something which isn't happening in the remote injector. Does anyone have any ideas why, or more importantly, how I could alter the code to achieve the goal of removing spaces from a large, arbitrary buffer in a different way where I might not encounter the overflow that I am currently?

Thanks for any help


Solution

  • typedef void*(WINAPI* MEMCPY)(void * destination, const void * source, size_t num); 
    
    typedef char*(WINAPI* STRSTR)(const char *haystack, const char *needle);
    

    is wrong declarations. both this api used __cdecl calling convention - this mean that caller must up stack ( add esp,4*param_count) after call. but because you declare it as __stdcall (== WINAPI) compiler not generate add esp,4*param_count instruction. so you have unbalanced push for parameters.

    you need use

    typedef void *  (__cdecl * MEMCPY)(void * _Dst, const void * _Src, _In_ size_t _MaxCount);
    typedef char* (__cdecl* STRSTR)(_In_z_ char* const _String, _In_z_ char const* const _SubString);
    

    and so on..