Search code examples
cgccvfork

Is this unstable behavior of execve after vfork a gcc bug?


So the object of this question is to determine, if possible, whether or not I'm looking at the signature of a compiler bug from oddball code brokenness. I have done as much as possible to eliminate undefined behavior from the surrounding code; however, bisection of the code into a standalone reproduction is not possible. I think I need a bunch of variables on the stack to get it to reproduce.

In fact the bug does not depend on the vfork() behavior at all; the function merely needs to be called vfork(). If I change the actual system call to fork() by changing the system call number in the asm stub for vfork() to call fork() instead the bug remains.

Broken code:

pid_t pid;
memcpy(buf, "60000", 6);
volatile int execve_error = 0;
int status;
if ((pid = vfork()) == 0) {
    /* Compiler bug? moving this inside vfork child causes memory corruption */
    const char *argv[13] = { "/bin/false", "-md", namebuf,
    "-k", skelbufbase, "-s", shell,
        "-u", buf, "-g", create_group, username, NULL };
    execve_error = execve(argv[0], argv, NULL);
    _exit(0);
} else if (pid < 0) {
    writeblock(2, "can't fork\n", 11);
    r = pid;
    goto cleanup4;
}
wait4(pid, &status, 0, NULL);
if (execve_error) { r = execve_error; goto cleanup4; } /* execve failed */

Working code:

pid_t pid;
memcpy(buf, "60000", 6);
volatile int execve_error = 0;
int status;
if ((pid = vfork()) == 0) {
    /* Compiler bug? moving this inside vfork child causes memory corruption */
    const char *argv[13] = { "/bin/false", "-md", namebuf,
        "-k", skelbufbase, "-s", shell,
        "-u", buf, "-g", create_group, username, NULL };
    writeblock(2, "exec()\n", 7);
    execve_error = execve(argv[0], argv, NULL);
    _exit(0);
} else if (pid < 0) {
    writeblock(2, "can't fork\n", 11);
    r = pid;
    goto cleanup4;
}
wait4(pid, &status, 0, NULL);
if (execve_error) { r = execve_error; goto cleanup4; } /* execve failed */

Other working code:

pid_t pid;
memcpy(buf, "60000", 6);
volatile int execve_error = 0;
int status;
/* Compiler bug? moving this inside vfork child causes memory corruption */
const char *argv[13] = { "/bin/false", "-md", namebuf,
    "-k", skelbufbase, "-s", shell,
    "-u", buf, "-g", create_group, username, NULL };
if ((pid = vfork()) == 0) {
    execve_error = execve(argv[0], argv, NULL);
    _exit(0);
} else if (pid < 0) {
    writeblock(2, "can't fork\n", 11);
    r = pid;
    goto cleanup4;
}
wait4(pid, &status, 0, NULL);
if (execve_error) { r = execve_error; goto cleanup4; } /* execve failed */

This wasn't all that frustrating to get the code to work; but it defies normal sensibilities for why it works or does not work. writeblock() just calls write() in a loop in case of part writes. If I take out the call to writeblock() on stderr (it used to be /dev/null in an older version of this question) the code stops working. It's not just malfunctioning on the second return from vfork(); it's already messed up before then because execve() doesn't succeed.

On debugging with strace: argv[0] is trashed in the argv array (points to unmapped memory) but the correct first argument is passed to execve().

The code is also fixed by moving the argv array creation to outside the vfork() call. Declaring arrays with static sizes is not one of the things listed as can't do under vfork() or setjmp().

Almost-MVCE:

static int __attribute__((noinline)) mkuser(char *buf, char *namebuf, const char *username,
        const char *runas_group, const char *create_group,
        const char *create_home, const char *shell)
{
    int r;
    unsigned long uidrange[4096];
    memzero(uidrange, sizeof(uidrange));
    int h = open("uid.db", O_CREAT|O_RDWR,0600);
    if (h < 0) return -h;
    r = flock(h, LOCK_EX);
    if (r < 0) return -r;
    for(;;) {
        r = readblock(h, buf, 4096);
        if (r == -ENOSPC) break; /* Sneaky; depends on absolute block size being >= 4096 */
        if (r) return -r;
        /* Depends on uid.db being the 0 length file */
    }
    size_t basenamelen = strlen(create_home);
    memcpy(namebuf, create_home, basenamelen);
    namebuf[basenamelen] = '/';
    size_t usernamelen = strlen(username);
    memcpy(namebuf + basenamelen + 1, username, usernamelen + 1);
    r = fake_mkdir(namebuf, 0755);
    if (r && r != -EEXIST) return -r;
    memcpy(namebuf + basenamelen + usernamelen + 1, "/home", 6);
    r = fake_mkdir(namebuf, 0755);
    if (r && r != -EEXIST) goto cleanup1;
    memcpy(namebuf + basenamelen + usernamelen + 1, "/etc", 5);
    r = fake_mkdir(namebuf, 0755);
    if (r && r != -EEXIST) goto cleanup2;
    memcpy(namebuf + basenamelen + usernamelen + 5, "/skel", 6);
    char *skelbufbase;
    skelbufbase = namebuf + 1024 + 42 + 42 + 64;
    memcpy(skelbufbase, namebuf, basenamelen + usernamelen + 12);
    r = fake_mkdir(namebuf, 0755);
    if (r && r != -EEXIST) goto cleanup3;
    memcpy(namebuf + basenamelen + usernamelen + 1, "/home/", 6);
    memcpy(namebuf + basenamelen + usernamelen + 7, username, usernamelen + 1);
    for (unsigned ui = 0; ui < sizeof(uidrange) / sizeof(uidrange[0]); ui++)
        for (unsigned uj = 0; uj < 8 * sizeof(uidrange[0]); uj++)
        {
            if (uidrange[ui] & ((unsigned long)1 << uj)) continue;
            pid_t pid;
            memcpy(buf, "60000", 6);
            volatile int execve_error = 0;
            int status;
            if ((pid = vfork()) == 0) {
                /* Compiler bug? moving this inside vfork child causes memory corruption */
                const char *argv[13] = { "/bin/false", "-md", namebuf,
                    "-k", skelbufbase, "-s", shell,
                    "-u", buf, "-g", create_group, username, NULL };
                execve_error = execve(argv[0], argv, NULL);
                _exit(0);
            } else if (pid < 0) {
                writeblock(2, "can't fork\n", 11);
                r = pid;
                goto cleanup4;
            }
            wait4(pid, &status, 0, NULL);
            if (execve_error) { r = execve_error; goto cleanup4; } /* execve failed */
            switch ((status >> 8) & 255) {
                case 0: /* Success: record user in internal database */
                    writeblock(2, "false should fail\n", sizeof("false should fail"));
                    r = -EIO;
                    goto cleanup4;
                case 1: case 10: r = -EIO; goto cleanup4; /* can't update files */
                case 2: case 3: r = -EINVAL; goto cleanup4; /* bad argument */
                case 4: break; /* Try next number */
                case 6: return r = -EINVAL; goto cleanup4; /* group not found */
                case 9: return r = -EEXIST; goto cleanup4; /* username in use */
                case 12: return r = -EIO; goto cleanup4; /* can't create home dir */
                case 14: return r = -EIO; goto cleanup4; /* can't update selinux */
                default: return r = -ERANGE; goto cleanup4; /* should not happen */
            }
        }
    r = -ENOSPC;
cleanup4:
    memcpy(namebuf + basenamelen + usernamelen + 1, "/etc/skel", 10);
    fake_rmdir(namebuf);
cleanup3:
    memcpy(namebuf + basenamelen + usernamelen + 1, "/etc", 5);
    fake_rmdir(namebuf);
cleanup2:
    memcpy(namebuf + basenamelen + usernamelen + 1, "/home", 6);
    fake_rmdir(namebuf);
cleanup1:
    namebuf[basenamelen + usernamelen + 1] = 0;
    fake_rmdir(namebuf);
    return -r;
}

You might be able to run this. fake_mkdir() and fake_rmdir() need to be provided via a .o file or the reproduction doesn't reproduce the issue. The following implementation will do:

int fake_mkdir(const char *path, int mode) { return 0; }
int fake_rmdir(const char *path) { return 0; }

Similarly memzero needs to be handled the samey way:

void memzero(void *buf, size_t n) { memset(buf, 0, n); }

In addition; the code expects execve to return its failure code rather than update errno. The following should be a workaround but I can't test it:

                execve(argv[0], argv, NULL);
                execve_error = errno;

Compilation options: gcc -Wl,--no-dynamic-linker -pie -nostdlib -nostartfiles -O3 -s -fpic -fno-asynchronous-unwind-tables -ffreestanding

If you try to understand this MVCE for why it's the way it is, it will make no sense to you. I spent hours deleting pieces and verifying I still had a reproducible sample. The amount of memory on the stack is definitely important to the MVCE.

I do have a complete runnable package depending only on system copies of errno.h and asm/unistd.h but it's too large to post here.


Solution

  • I have found the bad assembly. There was some discussion on random changes mean undefined behavior; this is essentially true; and it was true in this case; however there isn't undefined behavior in the code.

    The problem is under some cases, -fpic -ffreestanding still emits relocations despite there being nothing in the source code that needs relocations; but there's nothing to process them at runtime.

    The faulting assembly:

    .LC16:
            .quad LC7
    

    The assembly that uses it:

            movq    .LC16(%rip), %xmm1
            leaq    .LC8(%rip), %rax
            movq    %rax, %xmm2
            punpcklqdq      %xmm2, %xmm1
            movaps  %xmm1, 48(%rsp)
    

    What the assembly should have been:

            leaq    .LC7(%rip), %rax
            movq    %rax, %xmm1
            leaq    .LC8(%rip), %rax
            movq    %rax, %xmm2
            punpcklqdq      %xmm2, %xmm1
            movaps  %xmm1, 48(%rsp)
    

    From some discussion on the gcc mailing list on the matter:

    Me: Avoiding this [generating relocations for source code constructs that don't have them] must be possible because the relocation engine avoids this problem.

    Florian: It's quite brittle and requires constant maintenance as the compiler changes. We've simplified the code significantly of the past few years and may finally be able to properly fix it, as far such a thing is possible without rewriting self-relocation processing in assembler.

    So that's that then. It's a maintenance headache for the near future.