Search code examples
arrayscpointersargvgetopt

How does optind and argv behave in C?


I've been reading Head First C and I'm currently stuck at understanding int main(int argc, char *argv[]) and the optind variable of getopt(). I'm troubled by the same program as the one in How does "optind" get assigned in C?, unfortunately I still can't get my head over what's going on even after reading that question and man pages. My code is as follows:

#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    char *delivery = "";
    int thick = 0;
    int count = 0;
    char ch;

    printf("optind before loop is %i\n", optind);

    while ( (ch = getopt(argc, argv, "d:t")) != EOF)
    {
        printf("optind before switch is %i\n", optind);
        switch(ch)
        {
            case 'd':
                delivery = optarg;
                printf("argc in case d is %i\n", argc);
                printf("argv in case d is %c\n", *argv);
                printf("optind in case d is %i\n", optind);
                break;
            case 't':
                thick = 1;
                printf("argc in case t is %i\n", argc);
                printf("argv in case t is %c\n", *argv);
                printf("optind in case t is %i\n", optind);
                break;
            default:
                fprintf(stderr, "Unknown option: '%s'\n", optarg);
                return 1;
        }
        printf("after switch optind is %i\n", optind);
        printf("----------------\n");
    }

    printf("after loop argc is %i\n", argc);
    printf("after loop argv is %c\n", *argv);
    printf("after loop optind is %i\n", optind);
    printf("----------------\n");

    argc -= optind;
    argv += optind;

    printf("final argc is %i\n", argc);
    printf("final argv is %c\n", *argv);
    printf("final optind is %i\n", optind);

    if (thick) puts("Thick crust.");
    if (delivery[0]) printf("To be delivered %s.\n", delivery);

    puts("Ingredients:");

    for (count = 0 ; count < argc; count++)
    {
        puts(argv[count]);
    }
    return 0;
}

I ran the program with ./order_pizza bacon -d now -t shrimp and got the following output:

optind before loop is 1
optind before switch is 4
argc in case d is 6
argv in case d is O
optind in case d is 4
after switch optind is 4
----------------
argc in case t is 6
argv in case t is O
optind in case t is 5
after switch optind is 5
----------------
after loop argc is 6
after loop argv is O
after loop optind is 4
----------------
final argc is 2
final argv is ]
final optind is 4
Thick crust.
To be delivered now.
Ingredients:
bacon
pineapple

The code works fine, but I don't fully understand its internals. Here are my issues:

  1. char *argv[] in main is an array of pointers, in which each pointer points to the first character of the respective argument string, and argv is actually argv[0] right? And argv[0] is the address of the first character of ./order_pizza, which is .. But why does my output say argv was first O then ] (characters that don't even exist in my input)?

  2. According to the getopt() man page, "getopt() permutes the contents of argv as it scans, so that eventually all the nonoptions are at the end". Does this mean that after scanning, argv becomes {"./order_pizza","-d","now","-t","bacon","shrimp"}? Would the order of options and arguments be kept the same (like, could the permutated array possibly be turned into {"./order_pizza","-t","now","-d","shrimp","bacon"} instead)?

  3. After the while loop, optind is 4, so argv += optind should be equivalent to argv[0] = argv[0] + 4;, which looks like adding 4 to the address of . (the first character ./order_pizza) to me. Given the array has now been permutated into {"./order_pizza","-d","now","-t","bacon","shrimp"}, shouldn't we be trying to do argv[0] = argv[4]; or argv = arg[optind] instead?

  4. The argument vector starts with the element ./order_pizza. The man page says "If there are no more option characters, getopt() returns -1. Then optind is the index in argv of the first argv-element that is not an option. ". Since the vector is {"./order_pizza","-d","now","-t","bacon","shrimp"} after the processing, why would optind be 4 (the index of bacon) instead of 0 (the index of ./order_pizza). I know the former is what we want, but does ./order_pizza not count as an argv-element?

  5. Head First C states that "optind stores the number of strings read from the command line to get past the options", which seems to conflict with the getopt() man page, which states optind is an index number. Is this an oversight of the book's authors, or did I miss something and the authors are indeed correct?

The answers to the old question regarding this program helped me a lot, yet I still have the problems listed above. Thanks!


Solution

    1. char *argv[] in main is an array of pointers,

    It's a pointer to the first char * in such an array, actually. It is a quirk of C that array syntax can be used in a function parameter list to indicate a pointer. It would be equivalent -- in a function parameter list only -- to declare argv as char **argv. And some people do.

    in which each pointer points to the first character of the respective argument string,

    Yes.

    and argv is actually argv[0] right?

    No. argv[0] has one less level of indirection than does argv. The two cannot refer to the same object. For any valid pointer p, p[0] is equivalent to *p. That includes argv: argv[0] is equivalent to *argv, not to argv itself.

    And argv[0] is the address of the first character of ./order_pizza, which is ..

    Yes, when you launch the program via the command shown.

    But why does my output say argv was first O then ] (characters that don't even exist in my input)?

    It doesn't say that. Your program produces undefined behavior when it attempts to print *argv, a char * (see above), via printf conversion specifier %c, which is for values of type char.

    A good C compiler would warn you about that mismatch. If yours isn't warning then either turn up the warning level or get a better compiler.


    1. According to the getopt() man page, "getopt() permutes the contents of argv as it scans, so that eventually all the nonoptions are at the end".

    Note well that this is a characteristic of the default behavior of the GNU implementation of getopt(), which differs in this regard from the the behavior specified by POSIX (which is to just stop option processing when the first non-option argument is found).

    Does this mean that after scanning, argv becomes {"./order_pizza","-d","now","-t","bacon","shrimp"}?

    In your example command, "./order_pizza" is the command name, "-d" and "now" are an option and its option-argument, and "-t" is another option (that does not accept an argument). The remaining elements of argv, "bacon" and "shrimp", are the non-option arguments, and it is these that are permuted to the end.

    Would the order of options and arguments be kept the same (like, could the permutated array possibly be turned into {"./order_pizza","-t","now","-d","shrimp","bacon"} instead)?

    The docs don't specify, but I expect the relative order of the non-option arguments to be retained. That is, I expect the first ordering you suggest, not the alternative order you propose. If getopt() reordered non-option arguments then it would be unsuitable for a variety of common command idioms. For example, it would be very bad if getopt() processing converted mv a b -f to mv -f b a. Such an argument is not the same thing as documentation, though.


    1. After the while loop, optind is 4, so argv += optind should be equivalent to argv[0] = argv[0] + 4;,

    Absolutely not. argv is not the same thing as argv[0] (ever). argv += optind is equivalent to argv = &argv[optind]. That is, it updates argv to point to (the pointer to) the first non-option argument.


    1. The argument vector starts with the element ./order_pizza. The man page says "If there are no more option characters, getopt() returns -1. Then optind is the index in argv of the first argv-element that is not an option. ". Since the vector is {"./order_pizza","-d","now","-t","bacon","shrimp"} after the processing, why would optind be 4 (the index of bacon) instead of 0 (the index of ./order_pizza).

    The text is a little misleading there. Instead of "the first argv-element that is not an option", it would be more accurate to say "the first non-option argument, or the trailing null pointer if there are no non-option arguments". This is also consistent with the rest of the description of how getopt() maintains the value of optind. And it makes sense to do this, though that's not safe to use as a primary criterion.

    I know the former is what we want, but does ./order_pizza not count as an argv-element?

    Yes, ./order_pizza is an argv element, but it is not what the manual means here.


    1. Head First C states that "optind stores the number of strings read from the command line to get past the options", which seems to conflict with the getopt() man page, which states optind is an index number. Is this an oversight of the book's authors, or did I miss something and the authors are indeed correct?

    The book and the manual page do not conflict, though the book is a little looser when applied to the GNU version of getopt(). Suppose the command were issued as

    ./order_pizza -d now -t bacon shrimp
    

    After getopt() has processed all the options, as determined by it returning -1, the value of optind will be 4. This is the index of the first non-option argument, "bacon", in argv. It is also the number of preceding strings in argv. This is what the book means.

    The book is in fact exactly right for POSIX-conforming getopt(), but not entirely correct for the default behavior of GNU's version of getopt(). This is because GNU getopt() has the behavior we already discussed of looking through the whole argument list for options instead of stopping at the first non-option argument. Nevertheless, the idea it is trying to convey is the same: after getopt() has processed all the options, optind tells you where in argv the non-option arguments start.