Remove \n from multiline quoted string in PyParsing

I'm parsing a multiline quoted string with the following:

The file with the string (test.txt):

PROPERTY PName "Multiline quoted 
string" ;

The Python code:

linebreak = pp.Suppress(';')
identifier = pp.Word(pp.alphanums + '._!<>/[]$')
qs = pp.QuotedString('"', multiline = True)

ifile = open("test.txt",'r')
test_string = ifile.read()
ifile.close()

PROPERTY = (pp.Suppress(pp.Keyword('PROPERTY'))
            + identifier('propName')
            + qs('propValue')
            + linebreak
           )

for t, s, e in PROPERTY.scanString(test_string):
    t.asDict()

Which yields:

"PROPERTY": {
        "propName": "PName",
        "propValue": "Multiline quoted \n   string"
      }

Is it possible to remove the '\n' during the parsing time ?

Solution

This isn't really what the escChar argument is for, it is to indicate how to escape embedded characters that would normally be quote delimiters.

This is more what I would see as best handled with a parse action, which is a parse-time callback that can modify the tokens right after they are parsed, but before they are returned to the caller. Here is your code as a console session, adding the parse action remove_newlines to qs:

>>> text = """PROPERTY PName "Multiline quoted 
... string" ;"""
>>> import pyparsing as pp

>>> qs = pp.QuotedString('"', multiline=True)

>>> qs.searchString(text)
([(['Multiline quoted \nstring'], {})], {})

>>> def remove_newlines(t):
...     t[0] = t[0].replace('\n', '')
...     
>>> qs.addParseAction(remove_newlines)

>>> qs.searchString(text)
([(['Multiline quoted string'], {})], {})

The remove_newlines method is called after a qs is successfully parsed, and the resultant tokens are passed to the method as the t argument. We can modify these tokens in place. In this method, the newlines are replaced with the empty string, and then assigned back into the tokens, modifying them in place.