Search code examples
javascriptparsinggrammarpegpegjs

How do you parse nested comments in pegjs?


I was wondering how do you parse comments (say, a la Haskell), in pegjs.

The goal:

{-
    This is a comment and should parse.
    Comments start with {- and end with -}.
    If you've noticed, I still included {- and -} in the comment.
    This means that comments should also nest
    {- even {- to -} arbitrary -} levels
    But they should be balanced
-}

For example, the following should not parse:

{- I am an unbalanced -} comment -}

But you should also have an escape mechanism:

{- I can escape comment \{- characters like this \-} -}

This sorta seems like parsing s-expressions, but with s-expressions, it's easy:

sExpression = "(" [^)]* ")"

Because the close parens is just one character and I can "not" it with the carrot. As an aside, I'm wondering how one can "not" something that is longer than a single character in pegjs.

Thanks for your help.


Solution

  • This doesn't handle your escape mechanisms, but it should get you started (here's a link to see it live: pegedit; just click Build Parser and Parse at the top of screen.

    start = comment
    
    comment = COMSTART (not_com/comment)* COMSTOP
    
    not_com = (!COMSTOP !COMSTART.)
    
    COMSTART = '{-'
    
    COMSTOP = '-}'
    

    To answer your general question:

    As an aside, I'm wondering how one can "not" something that is longer than a single character in pegjs.

    The simple way is (!rulename .) where rulename is a another rule defined in your grammar. The ! rulename part just ensures that whatever's scanned next doesn't match rulename, but you still have to define something for the rule to match, which is why I included the ..