I'm having conceptual difficulty in understanding how to build a pyparsing
parser. The steps are: 1) build a parser by combining subclasses of ParserElement, and 2) use the parser to parse a string.
The following example works fine:
from pyparsing import Word, Literal, alphas, alphanums, delimitedList, QuotedString
name = Word(alphas+"_", alphanums+"_")
field = name
fieldlist = delimitedList(field)
doc = Literal('<Begin>') + fieldlist + Literal('**End**')
dstring = '<Begin>abc,de34,f_o_o**End**'
print(doc.parseString(dstring))
yielding the expected sequence of tokens:
['<Begin>', 'abc', 'de34', 'f_o_o', '**End**']
But (for example), the class QuotedString does not take a ParserElement as an argument so it can't be used to build up a parser. I'd expect to use it in the above example like:
name = Word(alphas+"_", alphanums+"_")
field = QuotedString(name) ### Wrong: doesn't allow "name" as an argument
fieldlist = delimitedList(field)
to parse a document of the form:
dstring = '<Begin>"abc", "de34", "f_o_o"**End**'
But since it can't be used that way, what is the proper syntax for including QuotedString in the construction of a parser for a list of quoted strings?
======== Edit ============
See answer below ...
QuotedString cannot be used for this task. But an OR function can achieve the same effect - allowing different forms of quotes while preserving the ability to parse the validity of the string contained within the quotes. The following code does that:
from pyparsing import Word, Literal, alphas, alphanums, delimitedList
from pyparsing import Group, QuotedString, ParseException, Suppress
name = Word(alphas+"_", alphanums+"_")
field = Suppress('"') + name + Suppress('"') ^ \ # double quote
Suppress("'") + name + Suppress("'") ^ \ # single quote
Suppress("<") + name + Suppress(">") ^ \ # html tag
Suppress("{{")+ name + Suppress("}}") # django template variable
fieldlist = Group(delimitedList(field))
doc = Literal('<Begin>') + fieldlist + Literal('**End**')
dstring = [
'<Begin>"abc","de34","f_o_o"**End**', # Good
'<Begin><abc>,{{de34}},\'f_o_o\'**End**', # Good
'<Begin>"abc",\'de34","f_o_o\'**End**', # Bad - mismatched quotes
'<Begin>"abc","de34","f_o#o"**End**', # Bad - invalid identifier
]
for ds in dstring:
print(ds)
try:
print(' ', doc.parseString(ds))
except ParseException as err:
print(" "*(err.column-1) + "^")
print(err)
This produces the desired output, accepting the two good test strings and rejecting the two bad ones:
<Begin>"abc","de34","f_o_o"**End**
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin><abc>,{{de34}},'f_o_o'**End**
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin>"abc",'de34","f_o_o'**End**
^
Expected "**End**" (at char 12), (line:1, col:13)
<Begin>"abc","de34","f_o#o"**End**
^
Expected "**End**" (at char 19), (line:1, col:20)
Thank you Paul for all the help and for producing such a cool package.