Search code examples
pythonjsonsly

how to do multi-line parsing with python's sly.Parser class?


I am trying to create my own programming language with a library called sly. I have created a lexer to tokenize my program but I am stuck on getting the parser to successfully parse multiple instructions. When I didn't take this into account and tried to make a program to test the parse tree, an it gave me an error saying that a bunch of tokens were invalid. The temporary solution I came up with was to use the following parse statement:

@_("expr \n expr")
def expr(self, t):
    return [t.expr0, t.expr1]
    # I have also tried:
    #  return t.expr1, t.expr2

This leads to an annoying bug were the parse tree would nest the statements to much.

[
 {
  "some statement",
  ...,
  "inner": [
   {
    "another statment",
     ...,
     "inner": [
      {
       "another statment",
       ...,
       "inner": []
      }
     ]
    }
  ]
 }
]

I want the parse tree to be flat so you only get the necessary nesting. Like the parse tree below.

[
 {
  "another statment",
  ...,
  "inner": []
 },
 {
  "another statment",
  ...,
  "inner": []
 },
 {
  "another statment",
  ...,
  "inner": []
 }
]

I was thinking about keeping the parse tree ugly and then reformatting it, but depending on the size of the tree it could make programs extremely slow.

The language is also part of a project I have, github repo linked here:

for all important files: https://github.com/0x32767/0x102-discord-bot/tree/star2py/jpl4py/lang

the parsing class: https://github.com/0x32767/0x102-discord-bot/blob/a268fe9a63f87a9ebf39088cff13ee9a2edab931/jpl4py/lang/jpl.py#L81

the parse tree that is genorated: https://github.com/0x32767/0x102-discord-bot/blob/star2py/jpl4py/lang/hWorld.jpl.out.json

the quick-fix line: https://github.com/0x32767/0x102-discord-bot/blob/a268fe9a63f87a9ebf39088cff13ee9a2edab931/jpl4py/lang/jpl.py#L217

Thank you for reading


Solution

  • I'm still a bit of a newbie at Sly, having started with Ply, but what you need is a rule that encompasses multiple statements and the easiest way to do it is via recursion. So you would wind up with a base statement like the following:

      expr_set : expr_set expr NEWLINE
               | expr NEWLINE
    

    expr_set always exports a set of results. If you hit the second case, that set is that single expression. If it's the first one, then it's the existing set with the new expression appended. So, you wind up with the following code:

    @_("expr_set expr \n")
    def expr_set(self, t):
        myset = t.expr_set
        myset.append(t.expr)
        return myset
    
    @_("expr \n")
    def expr_set(self, t):
        return [t.expr]
    

    Does that make sense? As a side note, that same expr_set can be used for any nested bit of code, such as in a function, or a loop. If newlines are significant for you (as seems to be the case), you may need a 3rd or 4th clause to deal with cases such as a line that's only a newline, and the final statement of the program (or section of commands) not having a newline after it. I asked a similar question here.

    Also, another minor pitfall I ran into for early implementation is that you're going to want to not implement the suggested newline ignore routine in the code example, since it will ignore and eat those newlines, and you need them to separate your statements.

    Lastly, as long as you don't need to worry about newlines by themselves, or a final statement without a newline, the following is valid for your purposes:

    @_("{ expr \n }")
    def expr_set(self, t):
        return t.expr
    

    This uses the "repetition" notation. Note that there must be spaces between the curly braces and what is within them. Whatever is within is returned as a list, with multiple items in the curly brackets being rendered as multiple lists. Here is some example code from my scripting language for a statement list where there is at least one statement, and every statement ends with a newline:

    @_('statement NEWLINE { statement NEWLINE }')
    def statement_set(self, p):
        statement_list = list(filter(lambda statement: statement != None, 
            p.statement1 + [p.statement0]))
        return StatementSet(statement_list)
    

    Again, you do need spaces between the curly braces and the other symbols or it won't parse it correctly (you don't want to know how long that tripped me up).