I'm parsing a multiline quoted string with the following:
The file with the string (test.txt):
PROPERTY PName "Multiline quoted
string" ;
The Python code:
linebreak = pp.Suppress(';')
identifier = pp.Word(pp.alphanums + '._!<>/[]$')
qs = pp.QuotedString('"', multiline = True)
ifile = open("test.txt",'r')
test_string = ifile.read()
ifile.close()
PROPERTY = (pp.Suppress(pp.Keyword('PROPERTY'))
+ identifier('propName')
+ qs('propValue')
+ linebreak
)
for t, s, e in PROPERTY.scanString(test_string):
t.asDict()
Which yields:
"PROPERTY": {
"propName": "PName",
"propValue": "Multiline quoted \n string"
}
Is it possible to remove the '\n' during the parsing time ?
This isn't really what the escChar
argument is for, it is to indicate how to escape embedded characters that would normally be quote delimiters.
This is more what I would see as best handled with a parse action, which is a parse-time callback that can modify the tokens right after they are parsed, but before they are returned to the caller. Here is your code as a console session, adding the parse action remove_newlines
to qs
:
>>> text = """PROPERTY PName "Multiline quoted
... string" ;"""
>>> import pyparsing as pp
>>> qs = pp.QuotedString('"', multiline=True)
>>> qs.searchString(text)
([(['Multiline quoted \nstring'], {})], {})
>>> def remove_newlines(t):
... t[0] = t[0].replace('\n', '')
...
>>> qs.addParseAction(remove_newlines)
>>> qs.searchString(text)
([(['Multiline quoted string'], {})], {})
The remove_newlines
method is called after a qs
is successfully parsed, and the resultant tokens are passed to the method as the t
argument. We can modify these tokens in place. In this method, the newlines are replaced with the empty string, and then assigned back into the tokens, modifying them in place.