Search code examples
javascriptpegjs

Parsing Paragraphs in Peg.JS


I am trying to learn peg.js and want to parse simple "blocks" of text, but am struggling with how to group sequential lines without getting a "possible infinite loop" error from my syntax.

Goal:

line 1

line 3
line 4

line 6

When parsed would become:

{
   "type": "root",
   "children": [
      { type: "para", content: "line 1" },
      { type: "para", content: "line 3\nline 4" },
      { type: "para", content: "line 6" },
   ]
}

In other words:

  • line one is a paragraph of its own because it is followed by a blank line
  • lines three and four are a paragraph because they're followed by a blank line
  • line six is a paragraph because it's the last line(s) (one ore more)

I can write a grammar that matches lines and blank lines (see http://peg.arcanis.fr/4f4NdP/), but anything I do to try to get multiple consecutive lines followed by a blank line (or EOF) turned into a paragraph ends up with recursion errors. I feel like this is a really simple n00b thing that I'm just missing because I haven't used a PEG before.

I know I could write a global function in the initializer block and track the last element and make it contextual, but I feel like that's not really using the grammar like I should be.


Solution

  • You know those weeks where you struggle with something for a day or so and then finally give up, swallow your pride and post a question to stack overflow ... and then ten minutes later figure out the answer? Yep! That's my week. I think the process of writing out the question makes you think about the problem in a different way and your synapses start firing again or something ...

    Anyway, here's the solution: http://peg.arcanis.fr/4f4NdP/2/

    Grammar for posterity:

    start = head:Para tail:(newline Para)*
       {
          var t;
    
          t = tail.reduce(function(memo, element) {
             return memo.concat(element[1]);
          }, []);
    
          return {
             type: 'root',
             children: [ head ].concat(t),
          }
       }
    
    Para = text:LineOfText+
       { return { type: 'para', content: text.join('\n') } }
    
    LineOfText = text:$(char+) EOL
       { return text }
    
    char = [^\n\r]
    newline = '\n' / '\r' '\n'?
    EOL = newline / !.
    

    Input:

    line 1
    
    line 3
    line 4
    
    line 6
    

    Output:

    {
       "type": "root",
       "children": [
          {
             "type": "para",
             "content": "line 1"
          },
          {
             "type": "para",
             "content": "line 3
    line 4"
          },
          {
             "type": "para",
             "content": "line 6"
          }
       ]
    }