Search code examples
c++clinuxchrootptrace

Detect program launch in another thread on linux


I am trying to sandbox ELF binaries by (among other things) chrooting them after they have been launched. To do so, a child process cloned with the CLONE_FS tag performs a chroot, while the parent runs the binary by calling an exec function.

The trick actually works if the chroot happens after the program has finished loading the shared libraries it needs. The problem is that I can't find a way to detect when this actually happens from the other process. Is there any way?


Solution

  • You can use a preload library with a function executed just prior to main(), a helper binary with CAP_SYS_CHROOT permitted filesystem capability, and an Unix domain socket pair between the two.

    The helper binary creates the socket pair, then uses clone(CLONE_FS) to fork a helper process that shares the file system information, sets LD_PRELOAD to load the preload library, and executes the sandboxed binary. (exec resets the capabilities per the sandboxed binary filesystem capabilities, so the sandboxed binary will not have any extra privileges at all.)

    The helper process adds CAP_SYS_CHROOT to the effective set, waits for the sandboxed binary (preload library) to notify it via the socket, calls chroot(), and notifies the sandboxed binary (preload library) of success.

    Note: There is absolutely no need to mark the helper binary setuid root, or to give the sandboxed binary any capabilities or privileges. We can do this with minimal privileges: CAP_SYS_CHROOT capability is sufficient.

    I prefer to add the capability to the binary only into the permitted set, so that the binary itself has to add the capability to the effective set before chroot() works. I feel this approach reduces the effects of possible installation/administrator errors. If you disagree, feel free to omit the corresponding code from exec.c, and use =pe instead of =p in the setcap command in Makefile.

    The neat thing here is that the preload library could also interpose desired C functions, and use the unix domain socket to obtain the necessary information from the helper process; you can even use SCM_RIGHTS ancillary messages to transfer file descriptors from outside the chroot to the sandboxed binary. (In essence, this is what fakeroot does, but in reverse: instead of faking a chrooted environment, you can pick and choose which files the sandboxed binary can access from outside the chroot environment.) Just have the helper process stay alive as long as the other end of the socket is still open, so it'll exit after the sandboxed binary exits.

    Here is my example implementation that starts the helper process as a child process to the sandboxed binary, with the helper process exiting (and preload library reaping it) before the sandboxed main() is started.

    exec.c:

    #define _GNU_SOURCE
    #define _POSIX_C_SOURCE 200809L
    #include <unistd.h>
    #include <sys/capability.h>
    #include <sys/types.h>
    #include <sys/socket.h>
    #include <sys/mman.h>
    #include <sched.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdio.h>
    #include <errno.h>
    
    #ifndef  SOCKET_FD
    #error SOCKET_FD not defined!
    #endif
    
    #ifndef  LIBRARY_PATH
    #error LIBRARY_PATH not defined!
    #endif
    
    static size_t            helper_stack_size = 32768;
    static void             *helper_stack = NULL;
    static const char       *helper_chroot = NULL;
    static const cap_value_t helper_cap[] = { CAP_SYS_CHROOT };
    static const int         helper_caps = sizeof helper_cap / sizeof helper_cap[0];
    static int               socket_fd[2] = { -1, -1 };
    
    #ifdef __hppa
    #define helper_endstack  (helper_stack)
    #else
    #define helper_endstack  ((void *)((char *)helper_stack + helper_stack_size - 1))
    #endif
    
    static int helper_main(void *arg)
    {
        const char *const argv0 = arg;
        pid_t pid;
        cap_t caps;
    
        close(socket_fd[0]);
    
        /* Read the target PID. */
        {   char       *p = (char *)(&pid);
            char *const q = (char *)(&pid) + sizeof pid;
            ssize_t     n;
    
            while (p < q) {
                n = recv(socket_fd[1], p, (size_t)(q - p), MSG_WAITALL);
                if (n > (ssize_t)0)
                    p += n;
                else
                if (n != (ssize_t)-1) {
                    fprintf(stderr, "%s: %s.\n", argv0, strerror(EIO));
                    return 127;
                } else
                if (errno != EINTR) {
                    fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
                    return 127;
                }
            }
        }
    
        if (pid < (pid_t)2) {
            shutdown(socket_fd[1], SHUT_RDWR);
            close(socket_fd[1]);
            return 127;
        }
    
        /* Enable CAP_SYS_CHROOT. */
        caps = cap_get_proc();
        if (cap_set_flag(caps, CAP_EFFECTIVE, helper_caps, helper_cap, CAP_SET)) {
            shutdown(socket_fd[1], SHUT_RDWR);
            close(socket_fd[1]);
            fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
            return 127;
        }
        if (cap_set_proc(caps)) {
            shutdown(socket_fd[1], SHUT_RDWR);
            close(socket_fd[1]);
            fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
            return 127;
        }
    
        /* Target is ready to be chrooted, so do it now. */
        if (chroot(helper_chroot)) {
            shutdown(socket_fd[1], SHUT_RDWR);
            close(socket_fd[1]);
            fprintf(stderr, "%s: Cannot chroot: %s.\n", argv0, strerror(errno));
            return 127;
        }
    
        /* Send my own pid, so this process will be reaped. */
        {   const char       *p = (char *)(&pid);
            const char *const q = (char *)(&pid) + sizeof pid;
            ssize_t           n;
    
            pid = getpid();
    
            while (p < q) {
                n = send(socket_fd[1], p, (size_t)(q - p), MSG_NOSIGNAL);
                if (n > (ssize_t)0)
                    p += n;
                else
                if (n != (ssize_t)-1) {
                    fprintf(stderr, "%s: %s.\n", argv0, strerror(EIO));
                    return 127;
                } else
                if (errno != EINTR) {
                    fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
                    return 127;
                }
            }
        }
    
        /* We won't be sending anything else. */
        shutdown(socket_fd[1], SHUT_WR);
    
        /* Ignore further input; wait for other end to close descriptor. */
        {   char    buf[16];
            ssize_t n;
    
            while (1) {
                n = recv(socket_fd[1], buf, sizeof buf, 0);
                if (n > (ssize_t)0)
                    continue;
                else
                if (n == (ssize_t)0)
                    break;
                else
                if (n != (ssize_t)-1) {
                    fprintf(stderr, "%s: %s.\n", argv0, strerror(EIO));
                    return 127;
                } else
                if (errno == EPIPE)
                    break;
                else
                if (errno != EINTR) {
                    fprintf(stderr, "%s: %s.\n", argv0, strerror(errno));
                    return 127;
                }
            }
        }
    
        /* Close the socket, and exit. */
        shutdown(socket_fd[1], SHUT_RDWR);
        close(socket_fd[1]);
        return 0;
    }
    
    int main(int argc, char *argv[])
    {
       if (argc < 4 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
            fprintf(stderr, "\n");
            fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
            fprintf(stderr, "       %s CHROOT WORKDIR COMMAND [ ARGS ... ]\n", argv[0]);
            fprintf(stderr, "\n");
            fprintf(stderr, "Note: . is a valid WORKDIR.\n");
            fprintf(stderr, "\n");
            return 1;
        }
    
        if (chdir(argv[2])) {
            fprintf(stderr, "%s: %s.\n", argv[2], strerror(errno));
            return 1;
        }
    
        helper_stack = mmap(NULL, helper_stack_size, PROT_READ | PROT_WRITE,
                                  MAP_ANONYMOUS | MAP_PRIVATE | MAP_STACK | MAP_GROWSDOWN, -1, (off_t)0);
        if ((void *)helper_stack == MAP_FAILED) {
            fprintf(stderr, "Cannot create helper process stack: %s.\n", strerror(errno));
            return 1;
        }
        helper_chroot = argv[1];
    
        if (socketpair(AF_UNIX, SOCK_STREAM, 0, socket_fd)) {
            fprintf(stderr, "Cannot create an Unix domain stream socket pair: %s.\n", strerror(errno));
            return 1;
        }
    
        if (clone(helper_main, helper_endstack, CLONE_FS, argv[0]) == -1) {
            fprintf(stderr, "Cannot clone a helper process: %s.\n", strerror(errno));
            close(socket_fd[0]);
            close(socket_fd[1]);
            return 1;
        }
    
        close(socket_fd[1]);
        if (socket_fd[0] != SOCKET_FD) {
            if (dup2(socket_fd[0], SOCKET_FD) == -1) {
                fprintf(stderr, "Cannot move stream socket: %s.\n", strerror(errno));
                close(socket_fd[0]);
                close(SOCKET_FD);
                return 1;
            }
            close(socket_fd[0]);
        }
    
        setenv("LD_PRELOAD", LIBRARY_PATH, 1);
    
        /* Capabilities are reset over an execve(). */
        execvp(argv[3], argv + 3);
    
        close(SOCKET_FD);
        fprintf(stderr, "%s: %s.\n", argv[3], strerror(errno));
        return 1;
    }
    

    premain.c:

    #define _POSIX_C_SOURCE 200809L
    #include <stdlib.h>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <sys/socket.h>
    #include <sys/un.h>
    #include <fcntl.h>
    #include <string.h>
    #include <errno.h>
    
    #ifndef SOCKET_FD
    #error SOCKET_FD is not defined!
    #endif
    
    static void init(void) __attribute__ ((constructor (65535)));
    
    static void init(void)
    {
        pid_t pid;
    
        /* Note: We could probably only remove libpremain.so
         *       from the value, instead of clearing it altogether. */
        unsetenv("LD_PRELOAD");
    
        /* Verify SOCKFD is an Unix domain socket. */
        {   struct sockaddr_un  addr;
            socklen_t           addrlen = sizeof addr;
            memset(&addr, 0, sizeof addr);
    
            errno = EIO;
            if (getsockname(SOCKET_FD, (struct sockaddr *)&addr, &addrlen))
                switch (errno) {
    
                case EBADF:
                    /* SOCKET_FD is not open. Continue as if libpremain.so was never loaded. */
                    errno = 0;
                    return;
    
                case ENOTSOCK:
                    /* SOCKET_FD is not a socket. Continue as if libpremain.so was never loaded. */
                    errno = 0;
                    return;
    
                default:
                    /* All other errors are fatal. */
                    exit(127);
                }
    
            if (addr.sun_family != AF_UNIX) {
                /* SOCKET_FD is not an Unix domain socket. Continue as if libpremain.so was never loaded. */
                errno = 0;
                return;
            }
        }
    
        /* Make SOCKET_FD blocking and close-on-exec. */
        if (fcntl(SOCKET_FD, F_SETFD, (long)FD_CLOEXEC) ||
            fcntl(SOCKET_FD, F_SETFL, (long)0L))
            exit(127);
    
        /* Send our PID. */
        {   const char       *p = (const char *)(&pid);
            const char *const q = (const char *)(&pid) + sizeof pid;
    
            pid = getpid();
    
            while (p < q) {
                ssize_t n = send(SOCKET_FD, p, (size_t)(q - p), MSG_NOSIGNAL);
                if (n > (ssize_t)0)
                    p += n;
                else
                if (n != (ssize_t)-1)
                    exit(127);
                else
                if (errno != EINTR)
                    exit(127);
            }
        }
    
        /* Receive the PID from the other end. */
        {   char       *p = (char *)(&pid);
            char *const q = (char *)(&pid) + sizeof pid;
    
            pid = (pid_t)-1;
    
            while (p < q) {
                ssize_t n = recv(SOCKET_FD, p, (size_t)(q - p), MSG_WAITALL);
                if (n > (ssize_t)0)
                    p += n;
                else
                if (n != (ssize_t)-1)
                    exit(127);
                else
                if (errno != EINTR)
                    exit(127);
            }
        }
    
        shutdown(SOCKET_FD, SHUT_RDWR);
        close(SOCKET_FD);
    
        /* If the PID is > 1, we wait for it to exit.
         * If an error occurs, it's not a problem. */
        if (pid > (pid_t)1) {
            pid_t p;
            do {
                p = waitpid(pid, NULL, 0);
            } while (p == (pid_t)-1 && errno == EINTR);
        }
    
        /* All done. */
        return;
    }
    

    Makefile:

    CC  := gcc
    CFLAGS  := -Wall -O3
    LD  := $(CC)
    LDFLAGS := -lcap
    
    PREFIX  := /usr
    BINDIR  := $(PREFIX)/bin
    LIBDIR  := $(PREFIX)/lib
    
    SOCKFD  := 15
    
    .PHONY: all clean
    
    all: clean libpremain.so exec-chroot
    
    clean:
        rm -f libpremain.so exec-chroot
    
    libpremain.so: premain.c
        $(CC) $(CFLAGS) -DSOCKET_FD=$(SOCKFD) -fPIC -shared $^ -ldl -Wl,-soname,$@ $(LDFLAGS) -o $@
    
    exec-chroot: exec.c
        $(CC) $(CFLAGS) -DSOCKET_FD=$(SOCKFD) -DLIBRARY_PATH='"'$(LIBDIR)/libpremain.so'"' $^ $(LDFLAGS) -o $@
    
    install: libpremain.so exec-chroot
        sudo rm -f $(LIBDIR)/libpremain.so $(BINDIR)/exec-chroot
        sudo install -o `id -un` -g `id -gn` -m 00770 libpremain.so $(LIBDIR)/libpremain.so
        sudo install -o `id -un` -g `id -gn` -m 00770 exec-chroot $(BINDIR)/exec-chroot
        sudo setcap 'cap_sys_chroot=p' $(BINDIR)/exec-chroot
    
    uninstall:
        sudo rm -f $(LIBDIR)/libpremain.so $(BINDIR)/exec-chroot
    

    Note that the indentation in the Makefile is with tabs, not spaces. Run

    make PREFIX=/usr/local clean install
    

    to compile and install to /usr/local, but only executable by the current user. You can also use clean all to only recompile everything, or uninstall to uninstall the binaries.`

    This does require the libcap library. It is maintained as part of the kernel, but you might need to install a libcap-dev or libcap-devel or similarly-named package to get all the necessary files to compile against it.

    After installing, you can run e.g.

    exec-chroot /tmp /tmp ls -alF /
    

    to run ls -alF / in /tmp chrooted to /tmp. The output on Ubuntu machines is typically something like

    drwxrwxrwt 11    0    0  4096 May 29 23:55 ./
    drwxrwxrwt 11    0    0  4096 May 29 23:55 ../
    drwxrwxrwt  2    0    0  4096 May 29 17:15 .ICE-unix/
    -r--r--r--  1    0    0    11 May 29 17:15 .X0-lock
    drwxrwxrwt  2    0    0  4096 May 29 17:15 .X11-unix/
    drwx------  2 1000 1000  4096 May 29 17:15 .esd-1000/
    drwx------  2    0    0 16384 Dec  2  2011 lost+found/
    drwx------  2 1000 1000  4096 May 29 17:15 pulse-xxxxxxxxx/
    drwx------  2    0    0  4096 May 29 17:15 pulse-yyyyyyyyy/
    

    where the owner and group are 0 (root) and 1000 (user), respectively, because the passwd and group databases are inaccessible from within the chroot. However, as I already mentioned, it can be worked around by modifying and extending the above code as outlined.

    Although I did try to write the code with careful error handling, I have not really considered the overall operation thoroughly with respect to error conditions or security issues; that is why the files are installed only accessible to the current user.

    Questions?