I'm writing a PHP like interpreter and have some problems with shift/reduce and reduce/reduce conflicts. may someone help me to understand shift/reduce and reduce/reduce conflicts.
I have to write and interpreter that ship/echo non-valuables and valuate expressions starting with the "magic" @ character e.g. @if(cond) ... @end;. So "if" must be echoed while @if(cond) should be interpreted
Problem: scriptlang.y contains 21 shift/reduce conflicts and 2 reduce/reduce conflicts.
%union {
char* sval;
}
%token <sval> IDENTIFIER
%token <sval> RBRACKET
%token <sval> LBRACKET
%token <sval> KWSWITCH
%token <sval> KWIF
%token <sval> MAGICESC
%token MAGIC
%token ENDSTM
%type <sval> filechar
%start script
%%
script:
commands
;
commands:
/* empty */
| command
| commands command
;
command:
filechar {
analyser_echo($1,"filechar",analyser_canEcho);
}
| magic_command {}
;
filechar:
IDENTIFIER
| LBRACKET
| RBRACKET
| KWSWITCH
| KWIF
| MAGICESC
;
magic_command:
MAGIC valuation
| MAGIC alternative
;
valuation:
LBRACKET IDENTIFIER RBRACKET {
fprintf(yyout, "<val>");
}
;
alternative:
switch_alternative
| if_alternative
;
switch_alternative:
switch_block end_stm
;
switch_block:
switch_stm
| switch_stm commands
;
switch_stm:
KWSWITCH LBRACKET IDENTIFIER RBRACKET {}
;
if_alternative:
if_block end_stm
;
if_block:
if_stm
| if_stm commands
;
if_stm:
KWIF LBRACKET IDENTIFIER RBRACKET {}
;
end_stm:
ENDSTM
;
%%
flex file content:
"@@" {
yylval.sval = "@";
return MAGICESC;
}
"@" {
return MAGIC;
}
"(" {
yylval.sval = yytext;
return LBRACKET;
}
")" {
yylval.sval = yytext;
return RBRACKET;
}
"@end;" {
return ENDSTM;
}
"if" {
yylval.sval = yytext;
return KWIF;
}
"switch" {
yylval.sval = yytext;
return KWSWITCH;
}
[a-zA-Z][_a-zA-Z0-9]* {
yylval.sval = yytext;
return IDENTIFIER;
}
\n|. {
if(analyser_canEcho>0){
ECHO;
}
}
%%
The basic issue leading to the conflicts is your handling of commands
. Your intention is to define commands
as zero or more command
s, which is written as follows:
commands: %empty
| commands command
Had you meant to insist that there be at least one command, you would have written:
commands: command
| commands command
Mixing the two forms will not work, since the parser will not know whether to start a sequence of command
s with nothing (%empty
) or with a single command
. You should try to understand exactly why this leads to an ambiguity; you'll find a number of examples of similar problems on this site. For example, see this question.
That produces the 21 shift/reduce conflicts. The reduce/reduce conflicts are the result of the curious productions:
switch_block: switch_stm commands
if_block: if_stm commands
if
and switch
statements are single command
elements inside a commands
sequence; whatever follows the switch
or if
statement will be the next command
in the commands
. Defining switch_block
to include following commands is totally ambiguous: in effect, it is saying that the next command
might still be part of the switch_block
, or it might be a command
following the switch_block
.
Above, I specifically addressed the question you asked: the parsing table conflicts in your grammar. There are various other problems with your grammar and lexical specifications, and I strongly recommend you study whatever materials you have been given about bison/flex, and/or read the bison and flex manuals.
As a guide to your reading of the manuals or other materials, I'd suggest you focus on at least two things:
The handling of semantic values. It should never be necessary for a syntactic keyword to have its own representation as a semantic value; indeed, syntactic keywords rarely require semantic values at all. If a token does require that its semantic value be its representation, you need to remember that yytext
is a pointer into a private data buffer which you do not own, and which will be modified without warning. Copying is therefore required.
Embedded languages like your PHP variant involve two different lexical contexts. You have an outer, essentially uninterpreted context, and an embedded context which is contained between @
and @end;
. (F)lex provides start conditions to help deal with this sort of embedding. The manual has some examples, and there are many more floating around this site.