Search code examples
parsingrustpegpest

Using Pest.rs, how can I specify that comments are to be anchored and whole line?


Exim uses a really awkward comment syntax,

Blank lines in the file, and lines starting with a # character (ignoring leading white space) are treated as comments and are ignored. Note: A # character other than at the beginning of a line is not treated specially, and does not introduce a comment.

This means that,

# This is is a comment
This has no comments # at all

Is there a way to mirror this with Pest.rs? I've tried this,

COMMENT    = { "#" ~ (!NEWLINE ~ ANY)* ~ NEWLINE }
WHITESPACE = _{ " " }
main       = { SOI ~ ASCII_ALPHA* ~ EOI }

But, this will match on

MyText # Exim test this is not a comment 

How can I anchor the comment to the left?


Solution

  • This isn't possible with the default COMMENT expansion because it's expanded to all instances of rule-concatenation with ~ except for the atomics.. The following two lines are the same,

    a = { b ~ c }
    a = { b ~ WHITESPACE* ~ (COMMENT ~ WHITESPACE*)* ~ c }
    

    This essentially means that if you were to use the ~ and COMMENT you'd have to restrict your rules to atomic rules with @ and $

    Instead of this, for a line-based grammar, I ended up refining this and not using the COMMENT macro. Instead defining my own macro, _COMMENT to avoid the normal expansion into non-atomic tokens,

    WHITESPACE = _{ " " }
    _COMMENT   = { "#" ~ (!NEWLINE ~ ANY)* ~ NEWLINE }
    expr       = { ASCII_ALPHA+ }
    stmt       = { expr ~ NEWLINE }
    conf       = { SOI ~ (stmt | _COMMENT | NEWLINE)+ ~ EOI }
    

    Notice here both stmt and _COMMENT are NEWLINE terminated, and the conf includes 1 or more of them.