Search code examples
pythonnewlinepyparsing

Preserve newlines in nestedExpr


Is it possible for nestedExpr to preserve newlines?

Here is a simple example:

import pyparsing as pp

# Parse expressions like: \name{body}
name = pp.Word( pp.alphas )
body = pp.nestedExpr( '{', '}' )
expr = '\\' + name('name') + body('body')

# Example text to parse
txt = '''
This \works{fine}, but \it{
    does not
    preserve newlines
}
'''

# Show results
for e in expr.searchString(txt):
    print 'name: ' + e.name
    print 'body: ' + str(e.body) + '\n'

Output:

name: works
body: [['fine']]

name: it
body: [['does', 'not', 'preserve', 'newlines']]

As you can see, the body of the second expression (\it{ ...) is parsed despite the newlines in the body, but I would have expected the result to store each line in a separate subarray. This result makes it impossible to distinguish body contents with single vs. multiple lines.


Solution

  • I didn't get to look at your answer until just a few minutes ago, and I had already come up with this approach:

    body = pp.nestedExpr( '{', '}', content = (pp.LineEnd() | name.setWhitespaceChars(' ')))
    

    Changing body to this definition gives these results:

    name: works
    body: [['fine']]
    
    name: it
    body: [['\n', 'does', 'not', '\n', 'preserve', 'newlines', '\n']]
    

    EDIT:

    Wait, if what you want are the separate lines, then perhaps this is more what you are looking for:

    single_line = pp.OneOrMore(name.setWhitespaceChars(' ')).setParseAction(' '.join)
    multi_line = pp.OneOrMore(pp.Optional(single_line) + pp.LineEnd().suppress())
    body = pp.nestedExpr( '{', '}', content = multi_line | single_line )
    

    Which gives:

    name: works
    body: [['fine']]
    
    name: it
    body: [['does not', 'preserve newlines']]