Search code examples
cexecposix

When calling the exec*() family of functions, do the char* elements of argv all have to be unique?


I'm trying to write a small utility that relays its argument list to an exec'd process, except some of the incoming arguments are repeated when building the new process's argument list.

Below is a very simplified version of what I'm looking to do, which simply duplicates each argument once:

#include <stdlib.h>
#include <unistd.h>

#define PROG "ls"

int main(int argc, char* argv[] ) {

    int progArgCount = (argc-1)*2;
    char** execArgv = malloc(sizeof(char*)*(progArgCount+2)); // +2 for PROG and final 0
    execArgv[0] = PROG;
    for (int i = 0; i<progArgCount; ++i)
        execArgv[i+1] = argv[i/2+1];
    execArgv[progArgCount+1] = 0;

    execvp(PROG, execArgv );

} // end main()

Notice how the elements of execArgv are not unique. Specifically, the two elements in each duplication are the same, meaning they point to the same address in memory.

Does Standard C say anything about this usage? Is it incorrect, or undefined behavior? If not, is it still inadvisable, since the exec'd program might depend on the uniqueness of its argv elements? Please correct me if I'm wrong, but isn't it possible for programs to modify their argv elements directly, since they're non-const? Wouldn't that create a risk of the exec'd program blithely modifying argv[1] (say) and then accessing argv[2], falsely assuming that the two elements point to independent strings? I'm pretty sure I did this myself a few years ago when I was beginning to learn about C/C++, and I don't think it occurred to me at that time that the argv elements might not be unique.

I know that exec'ing involves "replacement of the process image", but I'm not sure what that entails exactly. I can imagine that it might involve deepcopying the given argv argument (execArgv in my example above) to fresh allocations of memory, which would probably uniquify the thing, but I don't know enough about the internals of the exec functions to say. And it would be wasteful, at least if the original data structure could instead be preserved across the "replacement" operation, so that's a reason for me to doubt that it happens. And perhaps different platforms/implementations behave differently in this respect? Can answerers please speak to this?


I tried to find documentation on this question, but I was only able to find the following, from http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html:

The arguments specified by a program with one of the exec functions shall be passed on to the new process image in the corresponding main() arguments.

The above doesn't clarify if it is a uniquified deepcopy of the arguments that is passed on to the new process, or not.

The argument argv is an array of character pointers to null-terminated strings. The application shall ensure that the last member of this array is a null pointer. These strings shall constitute the argument list available to the new process image. The value in argv[0] should point to a filename that is associated with the process being started by one of the exec functions.

Ditto for the above.

The argv[] and envp[] arrays of pointers and the strings to which those arrays point shall not be modified by a call to one of the exec functions, except as a consequence of replacing the process image.

I honestly don't know how to interpret the above. "Replacing the process image" is the entire point of the exec functions! If it's going to modify the array or the strings, then that would constitute a "consequence of replacing the process image", in one sense or another. This almost implies that the exec functions will modify argv. This excerpt simply reinforces my confusion.

The statement about argv[] and envp[] being constants is included to make explicit to future writers of language bindings that these objects are completely constant. Due to a limitation of the ISO C standard, it is not possible to state that idea in standard C. Specifying two levels of const-qualification for the argv[] and envp[] parameters for the exec functions may seem to be the natural choice, given that these functions do not modify either the array of pointers or the characters to which the function points, but this would disallow existing correct code. Instead, only the array of pointers is noted as constant. The table of assignment compatibility for dst= src derived from the ISO C standard summarizes the compatibility:

It's not clear what "The statement about argv[] and envp[] being constants" refers to; my leading theory is that it refers to the const-qualification of the parameters in the prototypes given at the top of the documentation page. But since those qualifiers only mark the pointers, and not the char data, it hardly makes explicit "that these objects are completely constant". Secondly, I don't know why the paragraph talks about "writers of language bindings"; bindings to what? How is that relevant to a general documentation page on the exec functions? Thirdly, the main thrust of the paragraph just seems to be saying that we are stuck with leaving the actual char content of the strings pointed to by the argv elements as non-const for the sake of backwards compatibility with the established ISO C standard and "existing correct code" that conforms to it. This is confirmed by the table which follows on the documentation page, which I will not quote here. None of this decisively answers my primary questions, although it does state fairly clearly in the middle of the excerpt that the exec functions, in themselves, do not modify the given argv object in any way.


I would greatly appreciate information pertaining to my primary questions as well as commentary on my interpretations and comprehension of the quoted documentation excerpts (particularly, if my interpretations are wrong in any way). Thanks!


Solution

  • Does Standard C say anything about this usage? Is it incorrect, or undefined behavior?

    There is no problem if two pointers pointing to the same memory location. This is not undefined behavior.

    If not, is it still inadvisable, since the exec'd program might depend on the uniqueness of its argv elements?

    POSIX standards do not specify anything about the uniqueness of argv elements.

    Please correct me if I'm wrong, but isn't it possible for programs to modify their argv elements directly, since they're non-const?

    From C Standards#5.1.2.2.1p2

    The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

    So, the answer is - Yes it is possible.

    Wouldn't that create a risk of the exec'd program blithely modifying argv[1] (say) and then accessing argv[2], falsely assuming that the two elements point to independent strings?

    In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process, replacing the previous executable.

    So, when exec family system call is executed, the program given in argument will be loaded into the caller's address space and over-write the program there. As a result, once the specified program file starts its execution, the original program in the caller's address space is gone and is replaced by the new program and the argument list argv stored in newly replaced address space.

    POSIX standard says:

    The number of bytes available for the new process' combined argument and environment lists is {ARG_MAX}. It is implementation-defined whether null terminators, pointers, and/or any alignment bytes are included in this total.

    And ARG_MAX:

    {ARG_MAX} Maximum length of argument to the exec functions including environment data.

    That means there is some space allocated for new process arguments and can safely assume that the argument strings copied to that space.

    I know that exec'ing involves "replacement of the process image", but I'm not sure what that entails exactly.

    Check this.

    And perhaps different platforms/implementations behave differently in this respect? Can answerers please speak to this?

    The implementation might vary from platform to platform but all variants of Unix must be following the same POSIX standard for maintaining the compatibility. So, I believe the behavior must be same on all platforms.