Search code examples
pythonparsingpyparsing

Using QuotedString in pyparsing


I'm having conceptual difficulty in understanding how to build a pyparsing parser. The steps are: 1) build a parser by combining subclasses of ParserElement, and 2) use the parser to parse a string.

The following example works fine:

from pyparsing import Word, Literal, alphas, alphanums, delimitedList, QuotedString

name = Word(alphas+"_", alphanums+"_")
field = name
fieldlist = delimitedList(field)
doc = Literal('<Begin>') + fieldlist + Literal('**End**')

dstring = '<Begin>abc,de34,f_o_o**End**'
print(doc.parseString(dstring))

yielding the expected sequence of tokens:

['<Begin>', 'abc', 'de34', 'f_o_o', '**End**']

But (for example), the class QuotedString does not take a ParserElement as an argument so it can't be used to build up a parser. I'd expect to use it in the above example like:

name = Word(alphas+"_", alphanums+"_")
field = QuotedString(name)     ### Wrong: doesn't allow "name" as an argument
fieldlist = delimitedList(field)

to parse a document of the form:

dstring = '<Begin>"abc", "de34", "f_o_o"**End**'

But since it can't be used that way, what is the proper syntax for including QuotedString in the construction of a parser for a list of quoted strings?

======== Edit ============

See answer below ...


Solution

  • QuotedString cannot be used for this task. But an OR function can achieve the same effect - allowing different forms of quotes while preserving the ability to parse the validity of the string contained within the quotes. The following code does that:

    from pyparsing import Word, Literal, alphas, alphanums, delimitedList
    from pyparsing import Group, QuotedString, ParseException, Suppress
    
    name = Word(alphas+"_", alphanums+"_")
    field = Suppress('"') + name + Suppress('"') ^ \    # double quote
            Suppress("'") + name + Suppress("'") ^ \    # single quote
            Suppress("<") + name + Suppress(">") ^ \    # html tag
            Suppress("{{")+ name + Suppress("}}")       # django template variable
    fieldlist = Group(delimitedList(field))
    doc = Literal('<Begin>') + fieldlist + Literal('**End**')
    
    dstring = [
        '<Begin>"abc","de34","f_o_o"**End**',      # Good
        '<Begin><abc>,{{de34}},\'f_o_o\'**End**',  # Good
        '<Begin>"abc",\'de34","f_o_o\'**End**',    # Bad - mismatched quotes
        '<Begin>"abc","de34","f_o#o"**End**',      # Bad - invalid identifier
    ]
    
    for ds in dstring:
        print(ds)
        try:
            print('  ', doc.parseString(ds))
        except ParseException as err:
            print(" "*(err.column-1) + "^")
            print(err)
    

    This produces the desired output, accepting the two good test strings and rejecting the two bad ones:

    <Begin>"abc","de34","f_o_o"**End**
       ['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
    <Begin><abc>,{{de34}},'f_o_o'**End**
       ['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
    <Begin>"abc",'de34","f_o_o'**End**
                ^
    Expected "**End**" (at char 12), (line:1, col:13)
    <Begin>"abc","de34","f_o#o"**End**
                       ^
    Expected "**End**" (at char 19), (line:1, col:20)
    

    Thank you Paul for all the help and for producing such a cool package.