Search code examples
cfileserializationposixfunction-pointers

Serialize a function pointer in C and save it in a file?


I am working on a C file register program that handles arbitrary generic data so the user needs to supply functions to be used, these functions are saved in function pointer in the register struct and work nicely. But I need to be able to run these functions again when the program is restarted ideally without the user needing the supply them again. I serialize important data about the register structure and write it into a header.

I was wondering how I can save the functions there too, a compiled c function is just raw binary data, right? So there must be a way to store it into a file and load the function pointers from the content in the file, but I am not sure how to this. Can someone point me in the right direction?

I am assuming it's possible to do this is C since it allows you to do pretty much anything but I might be missing something, can I do this without system calls at all? Or if not what would be the simplest way to do this in posix?

The functions are supplied when creating the register or creating new secondary indexes:

registerHandler* createAndOpenRecordFile(
int overwrite, char *filename, int keyPos, fn_keyCompare userCompare, fn_serialize userSerialize, fn_deserialize userDeserialize, int type, ...)

And saved as functions pointers:

typedef void* (*fn_serialize)(void*);
typedef void* (*fn_deserialize)(void*);
typedef int (*fn_keyCompare) (const void *, const void *);

typedef struct {
...
fn_serialize encode;
fn_deserialize decode;
fn_keyCompare compare;
} registerHandler;

Solution

  • While your logic makes some sort of sense, things much, much more complex than that. My answer is going to contain most of the comments already made here, only in answer form...

    Let's assume that you have a pointer to a function. If that function has a jump instruction in it, that jump instructions could jump to an absolute address. That means that when you deserialize the function, you have to have a way to force it to be loaded into the same address, so that the absolute jump jumps to the correct address.

    Which brings us to the next point. Given that your question is tagged with posix, there is no POSIX-compliant way to load code into a specific address, there's MAP_FIXED, but it's not going to work unless you write your own dynamic linker. Why does that matter? because the function's assembly code might reference the function's start address, for various reasons, most prominent of which is if the function itself gives its own address as an argument to another function.

    Which actually brings us to our next point. If the serialized function calls other functions, you'd have to serialize them too. But that's the "easy" part. The hard part is if the function jumps into the middle of another function rather than call the other function, which could happen e.g. as a result of tail-call optimization. That means you have to serialize everything the function jumps into (recursively), but if the function jumps to 0x00000000ff173831, how many bytes will you serialize from that address?

    For that matter, how do you know when any function ends in a portable way?

    Even worse, are you even guaranteed that the function is contiguous in memory? Sure, all existing, sane hardware OS memory managers and hardware architectures make it contiguous in memory, but is it guaranteed to be so 1 year from now?

    Yet another issue is: What if the user passes a different function based on something dynamic? i.e. if the environment variable X is true, we want function x(), otherwise we want y()?

    We're not even going to think about discussing portability across hardware architectures, operating systems, or even versions of the same hardware architecture.

    But we are going to talk about security. Assuming that you no longer require the user to give you a pointer to their code, which might have had a bug that they fixed in a new version, you'll continue to use the buggy version until the user remembers to "refresh" your data structures with new code.

    And when I say "bug" above, you should read "security vulnerability". If the vulnerable function you're serializing launches a shell, or indeed refers to anything outside the processes, it becomes a persistent exploit.

    In short, there's no way to do what you want to do in a sane and economic way. What you can do, instead, is to force the user to package these functions for you.

    The most obvious way to do it is asking them to pass a filename of a library which you then open with dlopen().

    Another way to do it is pass something like a Lua or JavaScript string and embed an engine to execute these strings as code.

    Yet another way is to pass paths to executables, and execute these when the data needs to be processed. This is what git does.

    But what you should probably do is just require that the user always passes these functions. Keep it simple.