OK, I have tried to rewrite this Bison grammar three times now and keep running into shift/reduce and reduce/reduce conflicts. The Syntax that I am trying to parse is as follows. The items in the {...} are for one or the other. The items i n the [...] are optional.
CONSTANT constant-name constant-class
constant-class = { EQUALS expression numeric-options }
{ EQUALS STRING string string-options }
numeric-options = [ PREFIX prefix-string ]
[ TAG tag-string ]
[ COUNTER #local-name ]
[ TYPENAME type-name ] ;
string-options = [ PREFIX prefix-string ]
[ TAG tag-string ] ;
CONSTANT (constant-name,...) EQUALS expression
[ INCREMENT expression ]
[ PREFIX prefix-string ]
[ TAG tag-string ]
[ COUNTER #local-name ]
[ TYPENAME type-name ];
CONSTANT constant-name EQUALS expression,
.
.
.
;
CONSTANT (constant-name,...) EQUALS expression,
.
.
.
;
I'm having issues with the getting the last three to all work. I can get any one of them to work, but not all 4 of them. I right now have one shift/reduce and one reduce/reduce conflict. The grammar I have is as follows:
constant
: SDL_K_CONSTANT constant_style ';'
;
constant_style
: constant_name constant_class
| constant_list
| constant_set
;
constant_name
: sdl_name
;
constant_class
: SDL_K_EQUALS sdl_decimal
| SDL_K_EQUALS SDL_K_STRING sdl_string
;
constant_names
: constant_name
| constant_names ',' constant_name
| '(' constant_names ')'
;
names_equal
: constant_names SDL_K_EQUALS sdl_decimal
;
constant_list
: names_equal
;
constant_set
: names_equal
| constant_set ',' names_equal
;
I think the names are self documenting (at least you should be able to understand the types). Any help would be greatly appreciated. I have a feeling that I either simplified too much or not enough.
NOTE: I figured out how to edit my post and removed the options and changed SDL_K_COMMA and SDL_K_SEMI to ',' and ';', respectively.
Thanks.
Here are some examples that should parse:
CONSTANT block_node_size EQUALS 24;
CONSTANT Strcon EQUALS STRING "This is a string constant" PREFIX Jg$
#block_size = 24;
CONSTANT block_node_size EQUALS #block_size;
CONSTANT
xyz EQUALS 10,
alpha EQUALS 0,
noname EQUALS 63;
CONSTANT
(zyx, nameless) EQUALS 10,
(beta) EQUALS 1,
gamma EQUALS 42;
CONSTANT (
bits,
bytes,
words,
longs,
quads,
octas
) EQUALS 0 INCREMENT 1 PREFIX ctx$;
CONSTANT
(bad_block,bad_data,,,,
overlay,rewrite) EQUALS 0 INCREMENT 4;
CONSTANT (pli,c,bliss,macro)
EQUALS 4 INCREMENT 4 PREFIX lang$
COUNTER #lang;
CONSTANT (basic,pascal,fortran)
EQUALS #lang + 4 INCREMENT 4 PREFIX lang$;
I hope this helps.
BTW: Here is the EBNF (sort of) for this:
/*
* Define the CONSTANT construct (Left out Expression).
*/
Str ::= "\"" Printable* "\""
Name ::= "$" | "_" | [A-Za-z] ("$" | "_" | [A-Z0-9a-z])*
Names ::= Name | ("(" Name ("," Name )* ")")
Constant_class ::= "EQUALS" (Expression Numeric_options | "STRING" Str
String_options)
String_options ::= Prefix? Tag?
Numeric_options ::= String_options Counter? Typename? Radix?
Increment_options ::= Increment? Numeric_options
Constant_list ::= Names "EQUALS" Expression Increment_options
Constant_set ::= Names Equals Expression
("," Names "EQUALS" Expression)*
Constant ::= "CONSTANT" (Name Constant_class | Constant_list |
Constant_set) ";"?
Prefix ::= "PREFIX" Str
Tag ::= "TAG" Str
Radix ::= "RADIX" ("DEC" | "OCT" | "HEX")
Counter ::= "COUNTER" Variable
Increment ::= "INCREMENT" Expression
Typename ::= "TYPENAME" Name
I think that's about it.
It's a little hard for me to understand what you are actually trying to do, so I've made some assumptions and provided some alternatives below. I hope that it comes close.
The basic issue is ambiguity in your grammar specifications. One of these might just be a mistake: according to your templates, the left-hand hand side of the EQUAL
seems to be either a single name
or a comma-separated list of name
s surrounded by parentheses. However, your grammar allows a comma-separated list of name
s, with the possibility that the first (or only) item in the list could be a parenthesized lists of names:
constant_names
: constant_name
| constant_names ',' constant_name
| '(' constant_names ')
That will match a
, a, b
, (a, b)
, (a, b), c
and (a, b), c, d
. BUt I think only the first and third of those are actually intended.
In any event, you have two productions:
constant_style
: constant_name constant_class
| constant_list
For the first one, we have:
constant_class
: SDL_K_EQUALS sdl_decimal
while for the second, we have:
constant_list: names_equal
names_equal
: constant_names SDL_K_EQUALS sdl_decimal
Since constant_name
can match a single name
, there are two different ways to match answer = 42
, which will inevitably lead to a parsing conflict.
But there's yet another possible parse for answer = 42
:
constant_set
: names_equal
So let's start by simplifying all that stuff, and then we can maybe get back to what (might be) your original goal. The idea is to factor everything which is syntactically similar:
constant-stmt : "CONSTANT" clause-list ';'
clause-list : clause | clause-list ',' clause
clause : left-hand-side "EQUALS" right-hand-side
left-hand-side : name | '(' name-list ')'
name-list : name | name-list ',' name
right-hand-side: expression /* See below */
I hope that is all simple enough to understand.
But we can see from the original that (at least in certain circumstances), there are many more options for right-hand-side
than appear in the above snippet. It would be trivial to just add the other syntaxes to right-hand-side
. However, it appears that the intention is that these extra options are only available in the case that there is a single clause. In that case, we could do something like this:
constant-stmt : "CONSTANT" constant-body ';'
constant-body : complex-clause | clause-list
clause-list : clause | clause-list ',' clause
clause : left-hand-side "EQUALS" right-hand-side
right-hand-side: expression
complex-clause : left-hand-side "EQUALS" complex-rhs
complex-rhs : expression numeric-options
| "STRING" string-literal string-options
But unfortunately, that is back to being ambiguous because numeric-options
might be empty, so expression
will match both right-hand-side
and complex-right-hand-side
.
In practice, this ambiguity doesn't matter. There is no semantic difference between the declaration name EQUALS expression
as the only declaration in a CONSTANT
declaration or as one of a list of such declarations. So one possibility is to just ignore the resulting reduce/reduce conflict, possibly by putting %expect 1
into the file. But that's not very pleasant really. So we will try to eliminate the problem.
One way would be to insist that the first complex-rhs
have at least one numeric-option
. But that's going to be truly annoying. First, we'd have to create yet another clause type -- first-complex-clause
or some such -- and second, we'd have to write out the requirement that at least one option be present.
As a simpler alternative, we can just require that a clause-list
have at least two clause
s, rather than just one. Since every clause
can also match complex-clause
, the restriction on clause-list
will not reject any valid clause
. So that gives us the very minor change:
constant-body : complex-clause | clause-list
clause-list : clause ',' clause | clause-list ',' clause
With a more precise description of the intended syntax (although it is still missing some details), I modified the above grammar to attempt to parse the full language. The basic principle remains intact: a definition consists of a single complex-clause
(which includes the simple clause case), or a list of at least two simple clause
s. The only difference is the way the two clause types are defined.
I also fixed the name-list production so that individual names can be omitted (eg. (bad_block,bad_data,,,,overlay,rewrite)
). As indicated in the guide, the list must contain at least one name, which slightly complicates the grammar.
I added the options from the guide (but not the extra ones in the EBNF). I did not try to deal with omitted semicolons, which don't seem to be allowed in the guide although there is an example of a declaration without a trailing semicolon. (That could be a typo.) I added the definition of expression
, to the best of my understanding.
I also added the local-name assignment statement because it was in the test file and it was easy.
Here's the grammar:
definition : %empty
| definition constant-defn
| definition local-assign
local-assign : LOCAL_NAME '=' expression ';'
constant-defn : "CONSTANT" constant-body ';'
constant-body : complex-clause | clause-list
clause-list : clause ',' clause | clause-list ',' clause
clause : name "EQUALS" expression
| name-list "EQUALS" expression
complex-clause : name "EQUALS" expression numeric-options
| name-list "EQUALS" expression list-options
| name "EQUALS" "STRING" string-literal string-options
| name-list "EQUALS" "STRING" string-literal string-options
name-list : '(' names ')'
names : optional-commas name | names ',' name | names ','
optional-commas : %empty | optional-commas ','
string-options : prefix-option tag-option
numeric-options : string-options counter-option typename-option
list-options : increment-option numeric-options
increment-option: %empty | "INCREMENT" expression
prefix-option : %empty | "PREFIX" prefix-name
tag-option : %empty | "TAG" tag-name
counter-option : %empty | "COUNTER" LOCAL_NAME
typename-option : %empty | "TYPENAME" name
expression : '-' expression %prec UNOP
| expression '*' expression
| expression '/' expression
| expression '+' expression
| expression '-' expression
| expression '@' expression
| expression '&' expression
| expression '!' expression
| '(' expression ')'
| INTEGER
| IDENT
| LOCAL_NAME
name : IDENT | QUOTED_IDENT
prefix-name : IDENT | QUOTED_NULL | QUOTED_IDENT
tag-name : IDENT | QUOTED_NULL | QUOTED_IDENT | QUOTED_TAG
string-literal : QUOTED_NULL | QUOTED_IDENT | QUOTED_TAG | QUOTED_STRING
You'll see that I added a discrimination of various types of quoted strings so that it is possible to deal with (most of) the different contexts in which quoted strings may appear. I didn't include the use of quoted strings of up to 4 characters as integer constants, as indicated in the guide, because I didn't notice it in time, and also because I couldn't figure out with a brief reading whether the intent is to allow constants in expressions whose names must be quoted (because the name conflicts with a keyword); in that case, it seems to me that there is an ambiguity with a quoted name with four or fewer characters.
In case it's not obvious how the discrimination works (and it probably isn't), here's the accompanying flex definition:
id [[:alpha:]$_][[:alnum:]$_]*
%%
[[:space:]]+ ;
CONSTANT { return K_CONSTANT; }
COUNTER { return K_COUNTER; }
EQUALS { return K_EQUALS; }
INCREMENT { return K_INCREMENT; }
PREFIX { return K_PREFIX; }
STRING { return K_STRING; }
TAG { return K_TAG; }
TYPENAME { return K_TYPENAME; }
[#]{id} { yylval.str = strndup(yytext, yyleng); return LOCAL_NAME; }
{id} { yylval.str = strndup(yytext, yyleng); return IDENT; }
["]{id}["] { yylval.str = strndup(yytext+1, yyleng-2); return QUOTED_IDENT; }
["]["] { yylval.str = strndup(yytext+1, yyleng-2); return QUOTED_NULL; }
["][[:alnum:]$_]+["] { yylval.str = strndup(yytext+1, yyleng-2); return QUOTED_TAG; }
["][[:print:]]+["] { yylval.str = strndup(yytext+1, yyleng-2); return QUOTED_STRING; }
[[:digit:]]+ { yylval.dec = strtoul(yytext, 0, 0); return INTEGER; }
. { return *yytext; }
I did a rather inadequate test by running it with the definitions at the end of the OP (although I added a trailing ;
in the second line). I didn't actually check that the parse tree was correct, but it certainly does all pass through the parser.