Search code examples
javascriptparsingpegpegjs

Trouble with PEG.js end of input


I am trying to write a simple grammer for PEG.js that would match something like this:

some text;
arbitrary other text that can also have µnicode; different expression;
let's escape the \; semicolon, and \not recognized escapes are not a problem;
possibly last expression not ending with semicolon

So basically these are some texts separated by semicolons. My simplified grammer looks like this:

start
= flow:Flow

Flow
= instructions:Instruction*

Instruction
= Empty / Text

TextCharacter
= "\\;" /
.

Text
= text:TextCharacter+ ';' {return text.join('')}

Empty
= Semicolon

Semicolon "semicolon"
= ';'

The problem is that if I put anything other than a semicolon in the input, I get:

SyntaxError: Expected ";", "\\;" or any character but end of input found.

How to solve this? I've read that PEG.js is unable to match end of input.


Solution

  • You have (at least) 2 problems:

    Your TextCharacter should not match any character (the .). It should match any character except a backslash and semi-colon, or it should match an escaped character:

    TextCharacter
     = [^\\;]
     / "\\" .
    

    The second problem is that your grammar mandates your input to end with a semi-colon (but your input does not end with a ;).

    How about something like this instead:

    start
     = instructions
    
    instructions
     = instruction (";" instruction)* ";"?
    
    instruction
     = chars:char+ {return chars.join("").trim();}
    
    char
     = [^\\;]
     / "\\" c:. {return ""+c;}
    

    which would parse your input as follows:

    [
       "some text",
       [
          [
             ";",
             "arbitrary other text that can also have µnicode"
          ],
          [
             ";",
             "different expression"
          ],
          [
             ";",
             "let's escape the ; semicolon, and not recognized escapes are not a problem"
          ],
          [
             ";",
             "possibly last expression not ending with semicolon"
          ]
       ]
    ]
    

    Note that the trailing semi-colon is optional now.