c++casting shared-libraries move-semantics alloca

Allocating structs of arbitrary constant size on the stack

I've written a small working plugin server. The plugins are implemented using .so shared objects, which are manually loaded during runtime in the "server" by calls to dlopen (header <dlfcn.h>).

All of the shared object plugins have the same interface:

extern "C" void* do_something() {
    return SharedAllocator<T>{}.allocate(...); // new T
}
extern "C" size_t id = ...; // unique

Basically, do_something returns a pointer to heap memory, that the caller is expected to free.
id is simply an identifier unique per .so.
T is a struct specific to each .so. Some of them share the same return type, some of them don't. The point here is, sizeof(T) is .so specific.

The server is in charge of dynamically loading and reading the symbols of the .so binaries. All .so plugins can call each other through a method do_something_proxy defined in the server binary, which acts as the glue between the callers and the callees:

extern "C" void* do_something_proxy(size_t id) {
    // find the requested handle
    auto handle = some_so_map.find(id)->second;

    // call the handle's `do_something`
    void* something_done = handle.do_something();

    // forward the result
    return something_done;
}

To simplify things a bit, let's say that some_so_map is a plain std::unordered_map<size_t, so_handle_t> filled using a bunch of calls to dlopen when the proxy is executed.

My issue is that every caller of do_something_proxy knows T at compile time. As I said earlier, T can vary from call site to call site; however T never changes for an arbitrary call site.

For reference, here's the definition all callers use:

template <typename T, size_t id>
T* typed_do_soemthing_proxy() {
    // simple cast of the proxy
    return reinterpret_cast<T*>(do_soemthing_proxy(id));
}

In other words, do_something_proxy for some arbitrary plugin id always has the same return type.

If it wasn't for the proxy, I could just template do_soemthing_proxy and pass T or an std::array<int8_t, N> with sizeof(T) == N, and the unnecessary memory allocated to ensure T is not sliced when calling do_something_proxy could be moved to the stack. However, the proxy cannot be aware of all possible return types during compile time and export a zillion versions of do_something_proxy.

So my question is, is there any way for do_soemthing_proxy to allocate the effective size of T on its stack (i.e. using alloca or some other form of stack allocation)?

As far as I can tell, alloca doesn't seem to work here, as do_soemthing_proxy can only receive a single value from the do_something function of the requested plugin. do_soemthing_proxy would receive both the size to allocate, and the bytes to copy to the allocated memory, at the same time. If only alloca could be "squished" in between...

I know I could allocate a fixed amount of memory on the stack using an std::array<int8_t, N> with 256 or even 1024 for values of N. However, this solution is a bit dirty. It unnecessarily copies data from one stackframe to another, and limits the amount of data that a plugin can return. To top it off, (while I haven't benchmarked this solution yet) unless compilers can elide copies across dynamic boundaries, I'd assume copying 1024 bytes is more work than copying i.e. sizeof(std::string) bytes.

In an ideal world, I believe do_soemthing_proxy should return a struct that handles this with RAII. A const std::any that is stack-allocated, if you will. Is this even possible?

If this is not possible at all within c++, would it possible to achieve this behavior in a portable manner in assembly, i.e. by hijacking the stack or base pointers manually?

Thanks.

Solution

Actually, I just found a solution. It boils down to inverting the direction in which the memory location for the allocation of T is passed around.

Is there any way for do_soemthing_proxy to allocate the effective size of T on its stack?

Maybe. But what the code actually needs is an allocation of the effective size of T at the caller's location, not inside the proxy. And since the caller knows sizeof(T), all you have to do is allocate the space for T on the stack of the caller before calling do_something, and then pass the address of the allocated buffer to do_something_proxy when calling it:

For the caller:

template <typename T, size_t id>
T typed_do_something_proxy() {
    std::aligned_storage_t<sizeof(T), alignof(T)> return_buffer;
    do_something_proxy(id, &return_buffer);
    return *std::launder(reinterpret_cast<T*>(&return_buffer));
}

For the proxy:

extern "C" void do_something_proxy(size_t id, void* return_buffer) {
    auto handle = some_so_map.find(id)->second;
    handle.do_something(return_buffer);
}

For the callee

extern "C" void do_something(void* return_buffer) {
    new(return_buffer) T(...); // placement new
}