Search code examples
parsinghaskellebnfmegaparsec

Parse EBNF with Megaparsec nested sepBy


As an exercise I try to parse a EBNF/ABNF grammar with Megaparsec. I got trivial stuff like terminals and optionals working, but I'm struggling with alternatives. With this grammar:

S ::= 'hello' ['world'] IDENTIFIER LITERAL | 'test';

And this code:

production :: Parser Production
production = sepBy1 alternativeTerm (char '|') >>= return . Production

alternativeTerm :: Parser AlternativeTerm
alternativeTerm = sepBy1 term space >>= return . AlternativeTerm

term :: Parser Term
term = terminal
    <|> optional
    <|> identifier
    <|> literal

I get this error:

unexpected '|'
expecting "IDENTIFIER", "LITERAL", ''', '[', or white space

I guess the alternativeTerm parser is not returning to the production parser when it encounters a sequence that it cannot parse and throws an error instead.

What can I do about this? Change my ADT of an EBNF or should I somehow flatten the parsing. But then again, how can I do so?


Solution

  • It's probably best to expand my previous comment into a full answer.

    Your grammar is basically a list of list of terms seperated (and ended) by whitespace, which in turn is seperated by |. Your solution with sepBy1 does not work because there is a trailing whitespace after LITERAL - sepBy1 assumes there is another term following that whitespace and tries to apply term to the |, which fails.

    If your alternativeTerm is guaranteed to end with a whitespace character (or multiple), rewrite your alternativeTerm as follows:

    alternativeTerm = (term `sepEndBy1` space) >>= return . AlternativeTerm