Search code examples
cshellyacclexexecvp

execvp() prints invalid option -- '


I am creating simple shell interpreter. I am creating this shell using a scanner and a parser using lex and yacc utility. The problem arises when i give an argument to command as it returns invalid option.For example when I type in ls -l it returns ls: invalid option -- ' even though i have checked the value of the arg_list which holds the value of arguments and command and it stores the correct argument.

Can someone, please, help me understand why I am getting this error? For my code, the lex scans the input and returns a token to the parser whenever a string is matched.

(This code of program only has the capability to run a single argument).

This is my lex specification file. "shell.l"

%{
    #include<stdio.h>
    #include<stdlib.h>
    #include"y.tab.h"
    extern int len;
%}
%option noyywrap
letter [a-zA-Z]+

%%
(\-)?{letter}                       {yylval.id=yytext;len=yyleng;return WORD;}
[ \t\n]                         {;}
.                           {return NOTOKEN;}

%%

This is my yacc specification file and a my main() as well. "shell.y"

%{
    #include<stdio.h>
    #include<stdlib.h>
    #include<unistd.h>
    extern int yylex();
    void yyerror();
    int append =0;
    void insertArg();
    void insertComm();
    char *arg_list[20];
    int i,count=1;
    int len;
    char command[20];
%}
%union{
    float f;
    char  *id;
}
%start C
%token <id> WORD
%token  NOTOKEN
%type <id> C A

%%
A:A WORD    {$$=$2;insertArg($2,len);count++;}
 |
 ;
C:WORD {$$=$1;insertComm($1,len);/*printf("%s\n",$$);*/} A  
 ;

%%

void insertComm(char *c,int len)
{   

    for(i=0;i<len;i++)
    {   
        command[i]=c[i];

    }
    arg_list[0]=&command[0];
}
void insertArg(char *arg,int len)
{
    arg_list[count]=&arg[0];
}
void yyerror(char *msg){
    fprintf(stderr,"%s\n",msg);
    exit(1);
}

int main()
{   
    int status;
    while(1){
    yyparse();
    //arg_list[count]='\0';
    printf("\n arg_list[0]= %s",arg_list[0]);
    for(i=0;arg_list[i]!='\0';i++)
    {
        printf("\n arg_list[%d]= %s",i,arg_list[i]);
    }
    //printf("%s",sizeof(arg_list[1]));
    execvp(arg_list[0],arg_list);
    }
}

Solution

  • Diagnosis

    Your code ends up including a newline after the -l in the argument, and ls is complaining that it doesn't have a newline as a valid option letter.

    Diagnostic technique

    I modified the printing code in your main to read:

    printf("arg_list[0]= %s\n", arg_list[0]);
    for (i = 0; arg_list[i] != NULL; i++)
    {
        printf("arg_list[%d] = [%s]\n", i, arg_list[i]);
    }
    fflush(stdout);
    

    It produced:

    $ ./shell
    ls -l
    arg_list[0]= ls
    arg_list[0] = [ls]
    arg_list[1] = [-l
    ]
    ls: illegal option -- 
    
    usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
    $
    

    Notice that there is a newline after the -l and before the ] that marks the end of the string.

    There are various changes to note in the code. First, the print format strings end with a newline, not start — that helps ensure that the output is output in a timely manner. Without a newline at the end, the output may be delayed indefinitely, until something does produce a newline.

    Second, my compiler warned me about:

    shellg.y: In function ‘main’:
    shellg.y:60:24: warning: comparison between pointer and zero character constant [-Wpointer-compare]
         for(i=0;arg_list[i]!='\0';i++)
                            ^~
    shellg.y:60:13: note: did you mean to dereference the pointer?
         for(i=0;arg_list[i]!='\0';i++)
                 ^
    

    (Aside: I called the files shell.l and shellg.y — using the same basename twice leaves me at a loss because both object files would be shell.o under my normal build regime. I guess you use different rules for compiling your code.)

    I changed the '\0' (which is a "null pointer constant", but is an aconventional one and more typically indicates that the writer is confused) into NULL.

    The print format in the loop is important; notice how I enclosed the %s in marker characters [%s]. The placement of the ] in the output immediately shows that the newline is the problem. This is a valuable technique; it makes the invisible visible again.

    The final fflush(stdout) is not really crucial in this context, but it does ensure that any pending standard output has been generated before the execvp() replaces the program and loses that output forever. It would be reasonable to use fflush(0) or fflush(NULL) to ensure that other file streams (standard error) are also fully flushed, but standard error is normally not buffered very much.

    Prescription

    Clearly, the fix is to upgrade the lexical code to not include the newline in the argument. However, it isn't immediately obvious why that's happening at all. I altered the grammar to do some printing:

    %%
    A:A WORD    {printf("R1: len %d [%s]\n", len, $2); $$=$2;insertArg($2,len);count++;}
     |
     ;
    C:WORD {printf("R2A: len %d [%s]\n", len, $1); $$=$1;insertComm($1,len);/*printf("%s\n",$$);*/} A  
        {printf("R2B: len %d [%s]\n", len, $3);}
     ;
    
    %%
    

    Note in particular the R2B line; there could be an action there, but you didn't have one.

    When this is run, I get:

    $ ./shell
    ls -l
    R2A: len 2 [ls]
    R1: len 2 [-l]
    R2B: len 2 [-l
    ]
    arg_list[0]= ls
    arg_list[0] = [ls]
    arg_list[1] = [-l
    ]
    ls: illegal option -- 
    
    usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
    $
    

    It's interesting that the -l shows up in the R1 output without the newline, but a newline is added by the time of the R2B output. That was unexpected, but fortuitous. I added the printing to make sure that the coverage was complete; I'm very glad I did!

    So, what gives? Why isn't the token being reset to empty? Why is the newline added? Why isn't a copy of the token made, instead of storing a pointer to the yytext?

    The long term solution is going to be to make a copy; so we can start out right. I'm going to assume strdup() is available to you and will use that. I added #include <string.h> to the includes and used:

    void insertArg(char *arg,int len)
    {
        arg_list[count] = strdup(arg);
    }
    

    Hey presto! The desired output:

    $ ./shell
    ls -l
    R2A: len 2 [ls]
    R1: len 2 [-l]
    R2B: len 2 [-l
    ]
    arg_list[0]= ls
    arg_list[0] = [ls]
    arg_list[1] = [-l]
    total 128
    -rw-r--r--  1 jleffler  staff   1443 Apr 29 08:36 data
    -rwxr-xr-x  1 jleffler  staff  24516 Apr 29 09:08 shell
    -rw-r--r--  1 jleffler  staff    297 Apr 29 08:36 shell.l
    -rw-r--r--  1 jleffler  staff  13568 Apr 29 08:38 shell.o
    -rw-r--r--  1 jleffler  staff   4680 Apr 29 09:08 shellg.o
    -rw-r--r--  1 jleffler  staff   1306 Apr 29 09:08 shellg.y
    -rw-r--r--  1 jleffler  staff   2245 Apr 29 09:08 y.tab.h
    $
    

    Further observations

    You are going to need to make sure you release the duplicated strings. You also need to compile with warning options. Contrary to my normal practice, I compiled with just default (almost no) warnings. Compiling the grammar shows:

    yacc -d shellg.y
    shellg.y:28.3: warning: empty rule for typed nonterminal, and no action
    shellg.y:30.3-31.44: warning: unused value: $2
    gcc -O -c y.tab.c
    mv y.tab.o shellg.o
    rm -f y.tab.c
    

    You should resolve both those. You don't declare prototypes for the functions you define — you have:

    extern int yylex();
    void yyerror();
    int append =0;
    void insertArg();
    void insertComm();
    

    There are 4 function declarations, but none of them is a function prototype. You need to add the expected arguments or void where no arguments are expected.

    There are other problems too. With my normal compilation options (rmk is a variant of make; the -u means 'unconditional rebuild'), I get:

    $ rmk -ku
        yacc  shellg.y
    shellg.y:28.3: warning: empty rule for typed nonterminal, and no action
    shellg.y:30.3-31.44: warning: unused value: $2
        gcc -O3   -g         -std=c11   -Wall -Wextra -Werror -Wmissing-prototypes -Wstrict-prototypes         -c y.tab.c
    shellg.y:7:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
         extern int yylex();
         ^~~~~~
    shellg.y:8:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
         void yyerror();
         ^~~~
    shellg.y:10:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
         void insertArg();
         ^~~~
    shellg.y:11:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
         void insertComm();
         ^~~~
    shellg.y:36:6: error: no previous prototype for ‘insertComm’ [-Werror=missing-prototypes]
     void insertComm(char *c,int len)
          ^~~~~~~~~~
    shellg.y:45:6: error: no previous prototype for ‘insertArg’ [-Werror=missing-prototypes]
     void insertArg(char *arg,int len)
          ^~~~~~~~~
    shellg.y: In function ‘insertArg’:
    shellg.y:45:30: error: unused parameter ‘len’ [-Werror=unused-parameter]
     void insertArg(char *arg,int len)
                                  ^~~
    shellg.y: At top level:
    shellg.y:50:6: error: no previous prototype for ‘yyerror’ [-Werror=missing-prototypes]
     void yyerror(char *msg){
          ^~~~~~~
    shellg.y: In function ‘main’:
    shellg.y:57:9: error: unused variable ‘status’ [-Werror=unused-variable]
         int status;
             ^~~~~~
    cc1: all warnings being treated as errors
    rmk: error code 1
        lex  shell.l
        gcc -O3   -g         -std=c11   -Wall -Wextra -Werror -Wmissing-prototypes -Wstrict-prototypes         -c lex.yy.c
    lex.yy.c:1119:16: error: ‘input’ defined but not used [-Werror=unused-function]
         static int input  (void)
                    ^~~~~
    lex.yy.c:1078:17: error: ‘yyunput’ defined but not used [-Werror=unused-function]
         static void yyunput (int c, register char * yy_bp )
                     ^~~~~~~
    cc1: all warnings being treated as errors
    rmk: error code 1
    'shell' not remade because of errors.
    'all' not remade because of errors.
    $
    

    I was too lazy to fix all those problems too.

    I think you will need to fix your lexical analyzer too. The white space should not be included in the tokens returned, but it seems to be added to the end, but I'm not quite sure how/why.

    $ ./shell
    ls -l abelone ducks
    R2A: len 2 [ls]
    R1: len 2 [-l]
    R1: len 7 [abelone]
    R1: len 5 [ducks]
    R2B: len 5 [ducks
    ]
    arg_list[0]= ls
    arg_list[0] = [ls]
    arg_list[1] = [-l]
    arg_list[2] = [abelone]
    arg_list[3] = [ducks]
    ls: abelone: No such file or directory
    ls: ducks: No such file or directory
    $
    

    That R2B printing puzzles me. How/why is that being modified compared with the R1 before which shows ducks and no newline. You'll need to track that down, I think.

    Adding diagnostics to the analyzer:

    %%
    (\-)?{letter}        {printf("L1: [%s]\n", yytext); yylval.id=yytext;len=yyleng;return WORD;}
    [ \t\n]              {printf("L2: [%s]\n", yytext);}
    .                    {printf("L3: [%s]\n", yytext); return NOTOKEN;}
    %%
    

    and running it yields:

    $ ./shell
    ls -l
    L1: [ls]
    R2A: len 2 [ls]
    L2: [ ]
    L1: [-l]
    R1: len 2 [-l]
    L2: [
    ]
    R2B: len 2 [-l
    ]
    arg_list[0]= ls
    arg_list[0] = [ls]
    arg_list[1] = [-l]
    total 224
    -rw-r--r--  1 jleffler  staff   1443 Apr 29 08:36 data
    -rw-r--r--  1 jleffler  staff    303 Apr 29 09:16 makefile
    -rwxr-xr-x  1 jleffler  staff  24516 Apr 29 09:29 shell
    -rw-r--r--  1 jleffler  staff    385 Apr 29 09:29 shell.l
    -rw-r--r--  1 jleffler  staff  13812 Apr 29 09:29 shell.o
    -rw-r--r--  1 jleffler  staff   4680 Apr 29 09:08 shellg.o
    -rw-r--r--  1 jleffler  staff   1306 Apr 29 09:08 shellg.y
    -rw-r--r--  1 jleffler  staff  41299 Apr 29 09:16 y.tab.c
    -rw-r--r--  1 jleffler  staff   2245 Apr 29 09:08 y.tab.h
    $
    

    Have fun tracking how/why the R2B printout includes the newline.

    Detailed tracking — printing addresses

    JFTR, I'm working on a Mac running macOS 10.13.4 High Sierra, with GCC 7.3.0 (home-built), with Bison 2.3 running as Yacc, and Flex 2.5.35 Apple(flex-31) running as Lex.

    Here's a cleaned up shell.l — it compiles cleanly under my stringent warnings regime:

    %{
        #include <stdio.h>
        #include <stdlib.h>
        #include "y.tab.h"
        extern int len;
    %}
    %option noyywrap
    %option noinput
    %option nounput
    letter [a-zA-Z]+
    
    %%
    (\-)?{letter}       {printf("L1: [%s]\n", yytext); yylval.id=yytext;len=yyleng;return WORD;}
    [ \t\n]             {printf("L2: [%s]\n", yytext);}
    .                   {printf("L3: [%s]\n", yytext); return NOTOKEN;}
    %%
    

    And here's a more heavily diagnostic-laden version of shellg.y (that also compiles cleanly under my stringent warnings regime). I've reinstated the original code in insertArg() that copies the address into the arg_list array, but I've also added code to print out the complete contents of arg_list on each call. That turns out to be informative!

    %{
        #include<stdio.h>
        #include<stdlib.h>
        #include <string.h>
        #include<unistd.h>
        extern int yylex(void);
        void yyerror(char *msg);
        int append =0;
        void insertArg(char *arg, int len);
        void insertComm(char *arg, int len);
        char *arg_list[20];
        int i,count=1;
        int len;
        char command[20];
    %}
    %union{
        float f;
        char  *id;
    }
    %start C
    %token <id> WORD
    %token  NOTOKEN
    %type <id> C A
    
    %%
    A:  A   WORD    {printf("R1A: len %d %p [%s]\n", len, $2, $2); $$=$2;insertArg($2,len);count++;}
     |  /*Nothing */
                    {printf("R1B: - nothing\n");}
     ;
    
    C:  WORD
        {printf("R2A: len %d %p [%s]\n", len, $1, $1); $$=$1;insertComm($1,len);}
        A  
        {printf("R2B: len %d %p [%s]\n", len, $3, $3);}
     ;
    
    %%
    
    void insertComm(char *c, int len)
    {
        printf("Command: %d [%s]\n", len, c);
        for (i = 0; i < len; i++)
        {
            command[i] = c[i];
        }
        arg_list[0] = &command[0];
    }
    
    void insertArg(char *arg, int len)
    {
        printf("Argument: %d [%s]\n", len, arg);
        //arg_list[count] = strdup(arg);
        arg_list[count] = arg;
        for (int i = 0; i < count; i++)
            printf("list[%d] = %p [%s]\n", i, arg_list[i], arg_list[i]);
    }
    
    void yyerror(char *msg)
    {
        fprintf(stderr, "%s\n", msg);
        exit(1);
    }
    
    int main(void)
    {
        while (1)
        {
            yyparse();
            printf("arg_list[0]= %s\n", arg_list[0]);
            for (i = 0; arg_list[i] != NULL; i++)
            {
                printf("arg_list[%d] = [%s]\n", i, arg_list[i]);
            }
            fflush(stdout);
            execvp(arg_list[0], arg_list);
        }
    }
    

    When compiled and run, I can get the output:

    $ ./shell
    ls -l -rt makefile data shell
    L1: [ls]
    R2A: len 2 0x7f9dd4801000 [ls]
    Command: 2 [ls]
    R1B: - nothing
    L2: [ ]
    L1: [-l]
    R1A: len 2 0x7f9dd4801003 [-l]
    Argument: 2 [-l]
    list[0] = 0x10ace8180 [ls]
    L2: [ ]
    L1: [-rt]
    R1A: len 3 0x7f9dd4801006 [-rt]
    Argument: 3 [-rt]
    list[0] = 0x10ace8180 [ls]
    list[1] = 0x7f9dd4801003 [-l -rt]
    L2: [ ]
    L1: [makefile]
    R1A: len 8 0x7f9dd480100a [makefile]
    Argument: 8 [makefile]
    list[0] = 0x10ace8180 [ls]
    list[1] = 0x7f9dd4801003 [-l -rt makefile]
    list[2] = 0x7f9dd4801006 [-rt makefile]
    L2: [ ]
    L1: [data]
    R1A: len 4 0x7f9dd4801013 [data]
    Argument: 4 [data]
    list[0] = 0x10ace8180 [ls]
    list[1] = 0x7f9dd4801003 [-l -rt makefile data]
    list[2] = 0x7f9dd4801006 [-rt makefile data]
    list[3] = 0x7f9dd480100a [makefile data]
    L2: [ ]
    L1: [shell]
    R1A: len 5 0x7f9dd4801018 [shell]
    Argument: 5 [shell]
    list[0] = 0x10ace8180 [ls]
    list[1] = 0x7f9dd4801003 [-l -rt makefile data shell]
    list[2] = 0x7f9dd4801006 [-rt makefile data shell]
    list[3] = 0x7f9dd480100a [makefile data shell]
    list[4] = 0x7f9dd4801013 [data shell]
    L2: [
    ]
    R2B: len 5 0x7f9dd4801018 [shell
    ]
    arg_list[0]= ls
    arg_list[0] = [ls]
    arg_list[1] = [-l -rt makefile data shell
    ]
    arg_list[2] = [-rt makefile data shell
    ]
    arg_list[3] = [makefile data shell
    ]
    arg_list[4] = [data shell
    ]
    arg_list[5] = [shell
    ]
    ls: illegal option --  
    usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
    $
    

    Notice how the pointers in arg_list are pointing to different positions in a single string. The lexical analyzer inserts a null after the token when it returns, but replaces that null with the blank or newline (or tab if I'd typed any). This shows why it is necessary to copy the tokens. What's "in" arg_list[1] changes as the lexical analysis proceeds. It's why the newline appears.

    Note the comments

    Please note the comments by rici and follow the link too: