Search code examples
c++gccassemblyx86calling-convention

Using thunks to go from cdecl to thiscall (Linux x86)


I've been trying to use 'thunking' so I can use member functions to legacy APIs which expects a C function. I'm trying to use a similar solution to this. This is my thunk structure so far:

struct Thunk
{
    byte mov;   // ↓
    uint value; // mov esp, 'value' <-- replace the return address with 'this' (since this thunk was called with 'call', we can replace the 'pushed' return address with 'this')

    byte call;  // ↓
    int offset; // call 'offset' <-- we want to return here for ESP alignment, so we use call instead of 'jmp'

    byte sub;   // ↓
    byte esp;   // ↓
    byte num;   // sub esp, 4 <-- pop the 'this' pointer from the stack

    //perhaps I should use 'ret' here as well/instead?
} __attribute__((packed));

The following code is a test of mine which uses this thunk structure (but it does not yet work):

#include <iostream>
#include <sys/mman.h>
#include <cstdio>

typedef unsigned char byte;
typedef unsigned short ushort;
typedef unsigned int uint;
typedef unsigned long ulong;

#include "thunk.h"

template<typename Target, typename Source>
inline Target brute_cast(const Source s)
{
    static_assert(sizeof(Source) == sizeof(Target));

    union { Target t; Source s; } u;
    u.s = s;
    return u.t;
}

void Callback(void (*cb)(int, int))
{
    std::cout << "Calling...\n";
    cb(34, 71);
    std::cout << "Called!\n";
}

struct Test
{
    int m_x = 15;

    void Hi(int x, int y)
    {
        printf("X: %d | Y: %d | M: %d\n", x, y, m_x);
    }
};

int main(int argc, char * argv[])
{
    std::cout << "Begin Execution...\n";

    Test test;

    Thunk * thunk = static_cast<Thunk*>(mmap(nullptr, sizeof(Thunk),
        PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0));

    thunk->mov = 0xBC; // mov esp
    thunk->value = reinterpret_cast<uint>(&test);

    thunk->call = 0xE8; // call
    thunk->offset = brute_cast<uint>(&Test::Hi) - reinterpret_cast<uint>(thunk);
    thunk->offset -= 10; // Adjust the relative call

    thunk->sub = 0x83; // sub
    thunk->esp = 0xEC; // esp
    thunk->num = 0x04; // 'num'

    // Call the function
    Callback(reinterpret_cast<void (*)(int, int)>(thunk));
    std::cout << "End execution\n";
}

If I use that code; I receive a segmentation fault within the Test::Hi function. The reason is obvious (once you analyze the stack in GDB) but I do not know how to fix this. The stack is not aligned properly.

The x argument contains garbage but the y argument contains the this pointer (see the Thunk code). That means the stack is misaligned by 8 bytes, but I still don't know why this is the case. Can anyone tell why this is happening? x and y should contain 34 and 71 respectively.

NOTE: I'm aware of the fact that this is does not work in all scenarios (such as MI and VC++ thiscall convention) but I want to see if I can get this work, since I would benefit from it a lot!

EDIT: Obviously I also know that I can use static functions, but I see this more as a challenge...


Solution

  • Suppose you have a standalone (non-member, or maybe static) cdecl function:

    void Hi_cdecl(int x, int y)
    {
        printf("X: %d | Y: %d | M: %d\n", x, y, m_x);
    }
    

    Another function calls it this way:

    push 71
    push 36
    push (return-address)
    call (address-of-hi)
    add esp, 8 (stack cleanup)
    

    You want to replace this by the following:

    push 71
    push 36
    push this
    push (return-address)
    call (address-of-hi)
    add esp, 4 (cleanup of this from stack)
    add esp, 8 (stack cleanup)
    

    For this, you have to read the return-address from the stack, push this, and then, push the return-address. And for the cleanup, add 4 (not subtract) to esp.

    Regarding the return address - since the thunk must do some cleanup after the callee returns, it must store the original return-address somewhere, and push the return-address of the cleanup part of the thunk. So, where to store the original return-address?

    • In a global variable - might be an acceptable hack (since you probably don't need your solution to be reentrant)
    • On the stack - requires moving the whole block of parameters (using a machine-language equivalent of memmove), whose length is pretty much unknown

    Please also note that the resulting stack is not 16-byte-aligned; this can lead to crashes if the function uses certain types (those that require 8-byte and 16-byte alignment - the SSE ones, for example; also maybe double).