Search code examples
linuxx86execposixabi

Can exec*'s argv contain a value of 0 in multiple places?


I was reading that after exec creates a new process,

argv is an array of argument strings, with argv[argc] == 0

What happens if one of the other values within the array argv happens to be 0? Will the number of arguments (argc) be incorrectly calculated when the child process runs?

I read this on page 34 of the ABI of AMD64 (https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf).


Solution

  • The execve system call (which is used by all exec* functions) has an argument of the form char *const argv[]. The kernel calculates argc by iterating over the supplied argv as follows:

    static int count(struct user_arg_ptr argv, int max)
    {
        int i = 0;
    
        if (argv.ptr.native != NULL) {
            for (;;) {
                const char __user *p = get_user_arg_ptr(argv, i);
    
                if (!p)
                    break;
    
                if (IS_ERR(p))
                    return -EFAULT;
    
                if (i >= max)
                    return -E2BIG;
                ++i;
    
                if (fatal_signal_pending(current))
                    return -ERESTARTNOHAND;
                cond_resched();
            }
        }
        return i;
    }
    

    The function get_user_arg_ptr essentially calculates an index into the argv array and returns the pointer stored at that index. The loop breaks under four conditions, two of them are pertinent to your question:

    • On the first NULL seen in the argv array. If there are other pointers following the first NULL in argv, they are ignored. Having more than one NULL smells like a bug in the program that constructed argv.
    • When the number of pointers is larger than or equal to MAX_ARG_STRINGS, which is defined as 0x7FFFFFFF. In this case, the system call fails.

    The value of i returned is assigned to argc when get_user_arg_ptr returns.

    Another case where the terminating NULL in argv matters is when the application itself uses argv as follows:

    for(char **p = argv; *p != NULL; ++p)
    {
       // ...
    }
    

    It's part of the Linux ABI that argv terminates with NULL, so such code is legal and portable across all Linux implementations. By the way, this code is legal too on Windows. Therefore, argc is provided for convenience only.

    In addition, both the C and C++ standards state in 5.1.2.2.1 and 3.6.1, respectively, that if argc is larger than zero, then all values in argv[0] through argv[argc-1] shall be non-null pointers to null-terminated strings. Also argv[argc] must be null and that argc is non-negative. See also this answer.