Search code examples
bisoncontext-free-grammarbnf

How to fix the order in which a simple shell command is parsed?


Using the grammar rules defined below, I am trying to parse a simple shell command, let's say like cd testFolder.

These are my rules defined in parser.y:

%union{
    char *str;
}

%type <str> WORD

%%
command_list:/*empty*/
            |command_list command_line{  
                printf("myShell > ");
            }
            ;

arg_list:/*empty*/ 
        | arg_list WORD{
            printf("Args: %s\n", $2);
            free($2);
        }
        ;

cmd_and_args:
             WORD arg_list {
                printf("CMD: %s\n", $1);
                free($1);
             }
            ;

command_line:
            cmd_and_args NEWLINE {printf("NULL\n");
            }
            | NEWLINE {
                printf("NULL\n");
            }
%%

So what I wanted the output to be was:

CMD: cd
Args: testFolder
NULL

but what I get is:

Args: testFolder
CMD: cd
NULL

For a command like vim -O test.c test1.c , I get :

Args: -O
Args: test.c
Args: test1.c
CMD: vim
NULL

The args are in order, but the command ends up coming last. How do I get them in the right order?


Solution

  • Bison produces bottom-up parsers, which means that if you think about the parse as a tree, nodes are processed before their parents. (In other words, it's a post-order traverse.)

    So the action for

     cmd_and_args:  WORD arg_list { … }
    

    is executed after the action for arg_list.

    I don't see why this would be a problem, but you could change it either by using a Midrule Action or by using a unit production to extract the command word.

    Midrule Action

    cmd_and_args:  WORD { /* print $1*/ arg_list { /* arg_list is now $3 */ }
    

    Unit production

    cmd_and_args: command_word arg_list { … }
    
    command_word: WORD { /* print $1 */ }
    

    Note: the grammar does not represent the real shell grammar, which allows assignments to precede the command word (eg. LC_ALL=C sort file.txt).