Search code examples
pythonparsinglark-parser

Newline handling in python3 lark parser


I am using python(3.11.5) with lark-parser.

I am directly using the included python3 grammar, from https://github.com/lark-parser/lark/blob/master/lark/grammars/python.lark.

I am facing an issue in generating the parse tree for multiline dictionaries, or multiline function calls etc. For example -

data = {
    "name": "John",
    "age": 30,
    "city": "New York"
}

It gives the error - Unexpected token Token('_NEWLINE', '\n    ')

So my question is, how to take this into account? I have two ideas that I can think of -

  • having a pre-processor before parsing, which can bring everything to single line.
  • handling of newline in grammar.

Any other suggestions and explanations please.


Solution

  • Found the correct Indenter class that fixes everything, from the same linked github repo.

    class TreeIndenter(Indenter):
        NL_type = "_NEWLINE"
        OPEN_PAREN_types = ['LPAR', 'LSQB', 'LBRACE']
        CLOSE_PAREN_types = ['RPAR', 'RSQB', 'RBRACE']
        INDENT_type = "_INDENT"
        DEDENT_type = "_DEDENT"
        tab_len = 4