Search code examples
pythonpython-3.xpyparsing

Remove \n from multiline quoted string in PyParsing


I'm parsing a multiline quoted string with the following:

The file with the string (test.txt):

PROPERTY PName "Multiline quoted 
string" ;

The Python code:

linebreak = pp.Suppress(';')
identifier = pp.Word(pp.alphanums + '._!<>/[]$')
qs = pp.QuotedString('"', multiline = True)

ifile = open("test.txt",'r')
test_string = ifile.read()
ifile.close()

PROPERTY = (pp.Suppress(pp.Keyword('PROPERTY'))
            + identifier('propName')
            + qs('propValue')
            + linebreak
           )

for t, s, e in PROPERTY.scanString(test_string):
    t.asDict()

Which yields:

"PROPERTY": {
        "propName": "PName",
        "propValue": "Multiline quoted \n   string"
      }

Is it possible to remove the '\n' during the parsing time ?


Solution

  • This isn't really what the escChar argument is for, it is to indicate how to escape embedded characters that would normally be quote delimiters.

    This is more what I would see as best handled with a parse action, which is a parse-time callback that can modify the tokens right after they are parsed, but before they are returned to the caller. Here is your code as a console session, adding the parse action remove_newlines to qs:

    >>> text = """PROPERTY PName "Multiline quoted 
    ... string" ;"""
    >>> import pyparsing as pp
    
    >>> qs = pp.QuotedString('"', multiline=True)
    
    >>> qs.searchString(text)
    ([(['Multiline quoted \nstring'], {})], {})
    
    >>> def remove_newlines(t):
    ...     t[0] = t[0].replace('\n', '')
    ...     
    >>> qs.addParseAction(remove_newlines)
    
    >>> qs.searchString(text)
    ([(['Multiline quoted string'], {})], {})
    

    The remove_newlines method is called after a qs is successfully parsed, and the resultant tokens are passed to the method as the t argument. We can modify these tokens in place. In this method, the newlines are replaced with the empty string, and then assigned back into the tokens, modifying them in place.