I want to parse an ordered list, which is something like:
I - Something
II - Something else...
IX - Something weird
XIII - etc
So far, my treetop grammar is:
rule text
roman_numeral separator text newline
end
rule roman_numeral
&. ('MMM' / 'MM' / 'M')? (('C' [DM]) /
('D'? ('CCC' / 'CC' / 'C')?))? (('X' [LC]) /
('L'? ('XXX' / 'XX' / 'X')?))? (('I' [VX]) /
('V'? ('III' / 'II' / 'I')?))?
end
rule separator
[\s] "-" [\s]
end
rule text
(!"\n" .)*
end
rule newline
["\n"]
end
However, the corresponding parser is unable to parse the text. What is broken?
You accidentally overloaded text
. Rename the first to line
, and then add another rule for lines
.
The quotes around newline also seem unnecessary.
Side tip - you can reuse the newline
rule in your text
rule to keep it DRY.
grammar Roman
rule lines
line*
end
rule line
roman_numeral separator text newline
end
rule roman_numeral
&. ('MMM' / 'MM' / 'M')? (('C' [DM]) /
('D'? ('CCC' / 'CC' / 'C')?))? (('X' [LC]) /
('L'? ('XXX' / 'XX' / 'X')?))? (('I' [VX]) /
('V'? ('III' / 'II' / 'I')?))?
end
rule separator
[\s] "-" [\s]
end
rule text
(!newline .)*
end
rule newline
[\n]
end
end
You can simplify the grammar a little bit by removing the negative lookahead and the single character classes.
rule separator
" - "
end
rule text
[^\n]*
end
The resulting syntax graph becomes much simpler.