Search code examples
pythonparsingpyparsing

Using pyparsing and recursion


I'm trying to make a parser to a string that contains one or more of the following patterns:

  1. -flag
  2. -flag object
  3. -flag object {nested}
  4. -flag {object {nested}}

For example I will use the following string:

-flag1 -flag2 object2 { -nested_flag1 nested_obj1 -nested_flag2 }

To parse it I use:

exp = '-flag1 -flag2 object2 { -nested_flag1 nested_obj1 -nested_flag2 }'
only_flag = '-' + Word(printables, excludeChars='-').setResultsName("flag")
flag_w_obj = only_flag + Optional("{") + Word(printables, excludeChars='-').setResultsName("object")
flag_w_obj_w_nested = flag_w_obj + originalTextFor(nestedExpr("{", "}")).setResultsName("nested_expr")

parser = flag_w_obj_w_nested | flag_w_obj | only_flag
parsed = parser.searchString(exp)

How can I evaluate the nested expression by the same rules to get the nested flags and objects?

My final desired result is to make a dict containing the data in the format:

{
  "flag1": null,
  "flag2": {
    "object1": {
      "nested_flag1": "nested_obj1",
      "nested_flag2": null
    }
  }
}

Solution

  • Most times, whenever you have an expression with nested contents, you will end up using a pyparsing Forward expression. A Forward allows you to refer to an expression that is not yet fully defined - it is a forward declaration. In your parser, the -flag args... form can itself contain flag nested within {}'s. Once the contents can be specified, then the "assignment" can be done using the <<= operator.

    I also strongly recommend writing a short design of the parser before you begin actually writing any code (do this no matter what parsing library you plan on using). In parsing, a BNF (Backus-Naur Form) is the typical format to use. It does not have to be ultra-rigorous, but enough detail so that you can manually work through it to make sure you have the pieces right, and there are no ambiguities. Also good to write out a few example strings that you expect the parser to be able to process.

    The following annotated code shows the BNF I wrote out for your problem, and the implementation of it into a pyparsing parser.

    """
    BNF
    
    flag_name := '-' alphanumeric...
    flag_arg_word := alphanumeric...
    
    flag_arg := flag_arg_word | '{' flag_expr... '}'
    flag_expr := flag_name [flag_arg...]
    """
    
    import pyparsing as pp
    
    # define punctuation
    LBRACE, RBRACE = map(pp.Suppress, "{}")
    
    # recursive parser requires a forward declaraction
    flag_expr = pp.Forward()
    
    # implement BNF definitions
    flag_name = pp.Word("-", pp.alphanums + "_")
    flag_arg_word = pp.Word(pp.alphas, pp.alphanums + "_")
    flag_arg = flag_arg_word | (LBRACE + flag_expr[...] + RBRACE)
    
    # use '<<=' operator to define recursive expression
    flag_expr <<= pp.Group(flag_name + pp.Group(flag_arg[...]))
    
    # a command is 0 or more flag_exprs
    cmd_expr = flag_expr[...]
    
    # test it out
    cmd = "-flag1 -flag2 object2 { -nested_flag1 nested_obj1 -nested_flag2 }"
    parsed = cmd_expr.parseString(cmd)
    
    # convert to a list and show with pprint
    from pprint import pprint
    pprint(parsed.asList(), width=12)
    
    # runTests is very useful for running several test strings through a parser
    cmd_expr.runTests("""\
    -flag1 -flag2 object2 { -nested_flag1 nested_obj1 -nested_flag2 }
    """)