Search code examples
pythonparsingabstract-syntax-treemultilinequoting

Using python ast parser to process multi line strings


When using the python AST parser module in combination with scripts containing multi line strings, these multi line strings are always reduced to single line quoted strings. Example:

import ast

script = "text='''Line1\nLine2'''"

code = ast.parse (script, mode='exec')
print (ast.unparse (code))

node = code.body[0].value
print (node.lineno, node.end_lineno)

The output is:

> text = 'Line1\nLine2'
> 1 2

So in spite of being a multi line string before parsing, the text is reduced to a single line quoted string when unparsed. This makes script transformation difficult, because the multi lines are getting lost when unparsing a transformed AST graph.

Is there a way to parse/unparse scripts with multi line strings correctly with AST ?

Thank you in advance.


Solution

  • An examination of ast.unparse's underlying source reveals that the writer for the visit_Constant method, _write_constant, will produce the string repr unless the backslashing process is specifically avoided:

    class _Unparse:
       ...
       def _write_constant(self, value):
          if isinstance(value, (float, complex)):
              ...
          elif self._avoid_backslashes and isinstance(value, str):
              self._write_str_avoiding_backslashes(value)
          else:
              self.write(repr(value))
    

    By default, _avoid_backslashes is set to False, however, multiline string formatting can be properly performed by overriding visit_Constant and specifically calling _write_str_avoiding_backslashes if the string node is multiline:

    import ast
    class Unparser(ast._Unparser):
       def visit_Constant(self, node):
          if isinstance(node.value, str) and node.lineno < node.end_lineno:
             super()._write_str_avoiding_backslashes(node.value)
             return
          return super().visit_Constant(node)
    
    def _unparse(ast_node):
       u = Unparser()
       return u.visit(ast_node)
    
    script = "text='''Line1\nLine2'''"
    print(_unparse(ast.parse(script)))
    

    Output:

    text = """Line1
    Line2"""