Search code examples
bisonyacclex

Grammar not terminal?


I'm currently developing my own shell in C, Bison & Flex.

I just started learning and I can't find a way out of my grammar to make it work.

My problem is located at argList grammar (I think). The arg argList { $1->next = $2; $$ = $1; } is used to allow me to later pass more than one argument into a list.

Without it, compiled parser works as intended: it allows me to input the MOS token and it executes that function, allowing me after that to keep inputing MOS token with different (but limited to one) argument.

With that piece of grammar, the parser allows me to input the MOS token, but only executes that function when I Ctrl+D out of it. Why?

Thanks for any answer and I'm sorry if it looks bad, but I've tried to read all documentation available and I still can't figure this out.

Cheers.

BISON Code


commandList
  : command                { $$ = $1; }
  | command commandList  { $1->next = $2; $$ = $1; }
  ;


command
  : MOS argList             { $$ = insert_Mostra( NULL, $2 ); ExecuteCommands($$); }
  ;

argList
  : arg           { $$ = $1; }
  | arg argList   { $1->next = $2; $$ = $1; }
  ;

arg
  : VAR_VALUE { $$ = insert_Args(NULL, $1); }
  ;

Solution

  • With that piece of grammar, the parser allows me to input the MOS token, but only executes that function when I Ctrl+D out of it. Why?

    In a nutshell, because lex and yacc parsers are not suited for interactive input.

    If you want your shell to be interactive, and not only a batch scripting language, you need an input mechanism that obtains lines of input from the user, and then parses them as if they were tiny scripts.

    Of you're using lex, that means modifying the scanner so that it can take input from a buffer in memory, not only from a stream.

    You can't have an interactive shell by calling yyparse, which calls yylex , which pulls characters from stdin. If you do you run ito various issues.

    Firstly, the input will be line buffered (in the kernel, at the TTY level). lex doesn't see anything you type until you hit Enter.

    Secondly, bot the lexer and the parser use "lookahead". To recognize a token, the regex-driven lexer uses N character lookahead. The LALR(1) parser uses one token of lookahead. Lookahead adds up to confusion for the user, because the machine keeps waiting for more input even though it has already been given a complete phrase as far as the user is concerned.

    The user thinks: "why isn't the machine just evaluating the complete, syntactically valid expression I typed?".

    The machine thinks: "according to my grammar, this expression (though inarguably valid) could be just the prefix of a longer one; I need to see what, if anything, comes after it! I need either more input, or an EOF!".

    (Why would the lexer require N characters of lookahead? For instance, suppose we have a lexical grammar with three fixed tokens: aaaaab, a and x. Suppose the input is aaax. The scanner has to suspect that it may be looking at an aaaaab token, so it reads all the a characters right until it hits the x. At that point the regex automaton realizes that aaaaab cannot possibly match. And since the token doesn't start with x, it must be looking at an a token. So that a token is returned to the parser. Thus, the the input position must backtrack all the way to the second a. In other words, the machine had to look beyond the first a at aax to resolve the token: it used three characters of lookahead.)

    Lookahead parsing over free-format input will confuse an interactive end-user with irrecoverable errors; it's a user interface nonstarter.