I am writing a parser in GNU-Bison to parse captured bits of data of a proprietary protocol. The parser has following tokens:
H
....... HeaderD
....... Data T
....... Terminatorand five numbers of D
i.e. Data tokens make up a bulk B
i.e.
B : DDDDD
;
Ideally the input should be of the form
H DDDDD DDDDD DDDDD ... DDDDD T
aka
H B B B ... B T
So I have written the following grammar :
%%
CAPTURE : H PAYLOAD T { printf("[OK]");}
;
PAYLOAD : B
| PAYLOAD B
;
B : DDDDDD
%%
Now, to meet some practical conditions such as a pattern like following:
DD H DDDDD DDDDD ... DDDDD T
(extra D's at the prefix)H DDDDD DDDDD ... DDD T
(truncated last bulk having just 3 Data D
) I modified my grammar to
%%
CAPTURE : H PAYLOAD T { printf("[OK]");}
;
PAYLOAD : B
| PAYLOAD B
| D PAYLOAD
| PAYLOAD D
;
B : DDDDDD
%%
But It is giving shift/reduce conflict
.
Need help how to correct the grammar so that it recognises above two cases also and is shift/reduce conflict free.
%%
CAPTURE : OPTD1 H PAYLOAD OPTD2 T { printf("[OK]");}
;
PAYLOAD : B
| PAYLOAD B
;
B : D D D D D
;
OPTD1 :
| OPTD1 D
;
OPTD2 :
| D D D
;
%%
I added two new non-terminals OPTD1
and OPTD2
on the right-hand side of the first production and kept your original rule for PAYLOAD
as it was. OPTD1
can be rewritten as 0 or more D
terminal symbols and OPTD2
can be rewritten as 0 or 3 D
terminal symbols.
If your TOKENS, H, T and B were just the characters 'H', 'T' and 'B' respectively, you could easily recognize valid input with the following regular expression:
'^D*H(DDDDD)+(DDD)?T$'
In any event, you should be able to recognize valid input with a finite state automaton with the power of a pushdown automaton provided by YACC not being required.