In order to learn Lex/Yacc, I'm writing a CSV parser following the grammar specified on Page 3 of RFC 4180.
I've run into a "reduce/reduce conflict," and I'm not sure how to progress. It seems to be a conflict between Rules 1 and 3 of my grammar, but I don't know of any other way to describe a CSV with or without a line break following the last record. Also, when I remove Rule 10 (the empty field rule) the reduce/reduce conflict disappears; however, I need to handle empty fields.
What is the issue with my grammar and how should I correct it?
%token COMMA
%token DQUOTE
%token CRLF
%token TEXTDATA
%%
file: records CRLF
| records;
records: records CRLF record
| record;
record: fields;
fields: fields COMMA field
| field;
field: DQUOTE escaped DQUOTE
| TEXTDATA
| ;
escaped: escaped TEXTDATA
| escaped COMMA
| escaped CRLF
| escaped DQUOTE DQUOTE
| TEXTDATA
| COMMA
| CRLF
| DQUOTE DQUOTE;
yacc -v
OutputState 14 conflicts: 1 reduce/reduce
Grammar
0 $accept: file $end
1 file: records CRLF
2 | records
3 records: records CRLF record
4 | record
5 record: fields
6 fields: fields COMMA field
7 | field
8 field: DQUOTE escaped DQUOTE
9 | TEXTDATA
10 | /* empty */
11 escaped: escaped TEXTDATA
12 | escaped COMMA
13 | escaped CRLF
14 | escaped DQUOTE DQUOTE
15 | TEXTDATA
16 | COMMA
17 | CRLF
18 | DQUOTE DQUOTE
Terminals, with rules where they appear
$end (0) 0
error (256)
COMMA (258) 6 12 16
DQUOTE (259) 8 14 18
CRLF (260) 1 3 13 17
TEXTDATA (261) 9 11 15
Nonterminals, with rules where they appear
$accept (7)
on left: 0
file (8)
on left: 1 2, on right: 0
records (9)
on left: 3 4, on right: 1 2 3
record (10)
on left: 5, on right: 3 4
fields (11)
on left: 6 7, on right: 5 6
field (12)
on left: 8 9 10, on right: 6 7
escaped (13)
on left: 11 12 13 14 15 16 17 18, on right: 8 11 12 13 14
state 0
0 $accept: . file $end
DQUOTE shift, and go to state 1
TEXTDATA shift, and go to state 2
$default reduce using rule 10 (field)
file go to state 3
records go to state 4
record go to state 5
fields go to state 6
field go to state 7
state 1
8 field: DQUOTE . escaped DQUOTE
COMMA shift, and go to state 8
DQUOTE shift, and go to state 9
CRLF shift, and go to state 10
TEXTDATA shift, and go to state 11
escaped go to state 12
state 2
9 field: TEXTDATA .
$default reduce using rule 9 (field)
state 3
0 $accept: file . $end
$end shift, and go to state 13
state 4
1 file: records . CRLF
2 | records .
3 records: records . CRLF record
CRLF shift, and go to state 14
$default reduce using rule 2 (file)
state 5
4 records: record .
$default reduce using rule 4 (records)
state 6
5 record: fields .
6 fields: fields . COMMA field
COMMA shift, and go to state 15
$default reduce using rule 5 (record)
state 7
7 fields: field .
$default reduce using rule 7 (fields)
state 8
16 escaped: COMMA .
$default reduce using rule 16 (escaped)
state 9
18 escaped: DQUOTE . DQUOTE
DQUOTE shift, and go to state 16
state 10
17 escaped: CRLF .
$default reduce using rule 17 (escaped)
state 11
15 escaped: TEXTDATA .
$default reduce using rule 15 (escaped)
state 12
8 field: DQUOTE escaped . DQUOTE
11 escaped: escaped . TEXTDATA
12 | escaped . COMMA
13 | escaped . CRLF
14 | escaped . DQUOTE DQUOTE
COMMA shift, and go to state 17
DQUOTE shift, and go to state 18
CRLF shift, and go to state 19
TEXTDATA shift, and go to state 20
state 13
0 $accept: file $end .
$default accept
state 14
1 file: records CRLF .
3 records: records CRLF . record
DQUOTE shift, and go to state 1
TEXTDATA shift, and go to state 2
$end reduce using rule 1 (file)
$end [reduce using rule 10 (field)]
$default reduce using rule 10 (field)
record go to state 21
fields go to state 6
field go to state 7
state 15
6 fields: fields COMMA . field
DQUOTE shift, and go to state 1
TEXTDATA shift, and go to state 2
$default reduce using rule 10 (field)
field go to state 22
state 16
18 escaped: DQUOTE DQUOTE .
$default reduce using rule 18 (escaped)
state 17
12 escaped: escaped COMMA .
$default reduce using rule 12 (escaped)
state 18
8 field: DQUOTE escaped DQUOTE .
14 escaped: escaped DQUOTE . DQUOTE
DQUOTE shift, and go to state 23
$default reduce using rule 8 (field)
state 19
13 escaped: escaped CRLF .
$default reduce using rule 13 (escaped)
state 20
11 escaped: escaped TEXTDATA .
$default reduce using rule 11 (escaped)
state 21
3 records: records CRLF record .
$default reduce using rule 3 (records)
state 22
6 fields: fields COMMA field .
$default reduce using rule 6 (fields)
state 23
14 escaped: escaped DQUOTE DQUOTE .
$default reduce using rule 14 (escaped)
If the input is, for example, TEXTDATA CRLF
, it is unclear whether it should derive file -> records CRLF
and then derive records
to a single record or whether it should derive file -> records
and then derive records
to two records where the second contains only an empty field.
To avoid this ambiguity you can just remove the records CRLF
alternative. Files ending with a CRLF
will still be accepted - they'll be treated as having an empty field at the end.
If that's not what you want, you'll need to rewrite fields
, so that the last record is not allowed to be empty (and then keep the file: records CRLF
production).
PS: On an unrelated note, it seems to me that you should move some of your parsing work to the lexer, specifically the part where you parse the contents of quoted strings. Something like "abc"
would be best handled by making the lexer turn it into a single token.