How to safely update a running dynamic library?

I am implementing a live code reload mechanism for a program in C, and I have a function like this:

#include <sys/types.h>
#include <sys/stat.h>
#include <dlfcn.h>

void module_load(mod_t *mod) {
    struct stat statbuf;
    if (stat(mod->path, &statbuf) != 0) {
        // ...
    }
    if (statbuf.st_mtime != mod->time) {
        if (mod->code != NULL) {  // THIS here seems unsafe
            dlclose(mod->code);
        }
        mod->code = dlopen(mod->path, RTLD_GLOBAL | RTLD_LAZY);
        if (mod->code != NULL) {
            mod->foo = dlsym(mod->code, "foo");
            mod->bar = dlsym(mod->code, "bar");
            // ...
            mod->time = statbuf.st_mtime;
        }
    }
}

And my functions are called like this:

mod->foo();
mod->bar();

The system works fine, and functions get properly updated, but something worries me. The module_load function runs in a loop in a detached thread so the update could happen at any time, and while it seems to work fine, I wonder what weird things would happen if a library were updated while a function from it is being called.

I know that I can call the function without the loop in a joinable thread instead, and then wait for it to finish. This is probably a lot safer, but I would rather not be creating a new thread and joining it every single time.

I tried to temporarily have two "live" copies of the library at the same time, so that old functions could be still used while the new ones were being loaded, but the code kept using the old functions no matter what changes I made.

How can I reload the library and its functions safely, preferably in a detached thread?

To keep two live copies, I tried doing something like this:

...
void *new_code = dlopen(mod->path, RTLD_GLOBAL | RTLD_LAZY);
if (new_code != NULL) {
    mod->foo = dlsym(new_code, "foo");
    mod->bar = dlsym(new_code, "bar");
    // ...
    void *tmp = mod->code;
    mod->code = new_code;
    if (tmp != NULL) {
        dlclose(tmp);
    }
    mod->time = statbuf.st_mtime;

Solution

The dlclose() which you mark as possibly unsafe is, in fact, definitely unsafe. Any functions from the module currently active in other threads will have the floor pulled out from underneath them: their references to static data and other functions in the module will become dangling pointers.

So I'd say that you're better working out how to have two active versions of your module. You can't do that by just calling dlopen with the same path, because it caches open handles and it will just return the currently open handle with an incremented reference count. Instead, you can probably do the following:

Compile your modules into files with an included version number, and then symlink the official module filename to the latest version. (That's the way most .so files are generated on a typical Unix system.)
When you want to open a module, first use readlink(2) to find the currently-linked module version. Then open that path.

(I haven't actually tried that but I think it will work, at least on Unix-like systems.)

I'd suggest trying to avoid RTLD_GLOBAL, if possible. In general, dlcloseing a module opened with RTLD_GLOBAL is risky, at least; the dlclose might orphan resolved symbols used by other dynamically loaded modules. (And if no other module is going to use the symbols exported by the module you might dlclose, then RTLD_GLOBAL was never necessary.) I'm not convinced that RTLD_LAZY is a good idea, either.

Finally, you'll have to come up with some way to know when it is possible to dlclose old modules. You can't do it until you are certain that no thread is currently calling a function from the obsolete module. You might want to consider putting a reference count into the module structure, and using a macro or wrapper to ensure that reference counts are incremented and decremented pre- and post-call. You'll also want to add some kind of mutex to the module structure in order to avoid race conditions.