Search code examples
linuxshared-librariesbinaryfilesqemu

Calling aarch64 shared library from amd64 executable, maybe using binary translation/QEMU


I have an aarch64 library for Linux and I want to use it from within an amd64 Linux install. Currently, I know one method of getting this to work, which is to use the qemu-arm-static binary emulator with an aarch64 executable I compile myself, that calls dlopen on the aarch64 library and uses it.

The annoyance is that integrating the aarch64 executable with my amd64 environment is annoying (eg. let's say, for example, this arm64 library is from an IoT device and decodes a special video file in real time—how am I supposed to use the native libraries on my computer to play it?). I end up using UNIX pipes, but I really dislike this solution.

Is there a way I can use the qemu-arm-static stuff only with the library, so I can have an amd64 executable that directly calls the library? If not, what's the best way to interface between the two architectures? Is it pipes?


Solution

  • The solution that I implemented for this is to use shared memory IPC. This solution is particularly nice since it integrates pretty well with fixed-length C structs, allowing you to simply just use a struct on one end and the other end.

    Let's say you have a function with a signature uint32_t so_lib_function_a(uint32_t c[2])

    You can write a wrapper function in an amd64 library: uint32_t wrapped_so_lib_function_a(uint32_t c[2]).

    Then, you create a shared memory structure:

    typedef struct {
       uint32_t c[2];
       uint32_t ret;
       int turn; // turn = 0 means amd64 library, turn = 1 means arm library
    } ipc_call_struct;
    

    Initialise a struct like this, and then run shmget(SOME_SHM_KEY, sizeof(ipc_call_struct), IPC_CREAT | 0777);, get the return value from that, and then get a pointer to the shared memory. Then copy the initialised struct into shared memory.

    You then run shmget(3) and shmat(3) on the ARM binary side, getting a pointer to the shared memory as well. The ARM binary runs an infinite loop, waiting for its "turn." When turn is set to 1, the amd64 binary will block in a forever loop until the turn is 0. The ARM binary will execute the function, using the shared struct details as parameters and updating the shared memory struct with the return value. Then the ARM library will set the turn to 0 and block until turn is 1 again, which will allow the amd64 binary to do its thing until it's ready to call the ARM function again.

    Here is an example (it might not compile yet, but it gives you a general idea):

    Our "unknown" library : shared.h

    #include <stdint.h>
    
    #define MAGIC_NUMBER 0x44E
    
    uint32_t so_lib_function_a(uint32_t c[2]) {
       // Add args and multiplies by MAGIC_NUMBER
       uint32_t ret; 
       for (int i = 0; i < 2; i++) {
         ret += c[i];
       }
    
       ret *= MAGIC_NUMBER;
       return ret; 
    }
    

    Hooking into the "unknown" library: shared_executor.c

    #include <dlfcn.h>
    #include <sys/shm.h>
    #include <stdint.h>
    
    #define SHM_KEY 22828 // Some random SHM ID
    
    uint32_t (*so_lib_function_a)(uint32_t c[2]);
    
    typedef struct {
       uint32_t c[2];
       uint32_t ret;
       int turn; // turn = 0 means amd64 library, turn = 1 means arm library
    } ipc_call_struct;
    
    int main() {
       ipc_call_struct *handle; 
    
       void *lib_dlopen = dlopen("./shared.so", RTLD_LAZY);
       so_lib_function_a = dlsym(lib_dlopen, "so_lib_function_a");
    
       // setup shm
       
       int shm_id = shmget(SHM_KEY, sizeof(ipc_call_struct), IPC_CREAT | 0777);
       handle = shmat(shm_id, NULL, 0);
    
       // We expect the handle to already be initialised by the time we get here, so we don't have to do anything
       
       while (true) {
         if (handle->turn == 1) { // our turn 
           handle->ret = so_lib_function_a(handle->c);
           handle->turn = 0; // hand off for later
         }
       }
    }
    

    On the amd64 side: shm_shared.h

    #include <stdint.h>
    #include <sys/shm.h>
    
    typedef struct {
       uint32_t c[2];
       uint32_t ret;
       int turn; // turn = 0 means amd64 library, turn = 1 means arm library
    } ipc_call_struct;
    
    #define SHM_KEY 22828 // Some random SHM ID
    
    static ipc_call_struct* handle;
    
    void wrapper_init() {
      // setup shm here
      int shm_id = shmget(SHM_KEY, sizeof(ipc_call_struct), IPC_CREAT | 0777);
      handle = shmat(shm_id, NULL, 0);
    
      // Initialise the handle
      // Currently, we don't want to call the ARM library, so the turn is still zero
      ipc_call_struct temp_handle = { .c={0}, .ret=0, .turn=0 };
      *handle = temp_handle; 
    
      // you should be able to fork the ARM binary using "qemu-arm-static" here 
      // (and add code for that if you'd like)
    }
    
    uint32_t wrapped_so_lib_function_a(uint32_t c[2]) {
       handle->c = c;
       handle->turn = 1; // hand off execution to the ARM librar
       while (handle->turn != 0) {} // wait
       return handle->ret;  
    }
    

    Again, there's no guarantee this code even compiles (yet), but just a general idea.