Search code examples
pythonunicodeevalpython-2.xliterals

How can a representation of a literal be safely evaluated, assuming unicode_literals?


In Python 2, I would like to evaluate a string that contains the representation of a literal. I would like to do this safely, so I don't want to use eval()—instead I've become accustomed to using ast.literal_eval() for this kind of task.

However, I also want to evaluate under the assumption that string literals in plain quotes denote unicode objects—i.e. the kind of forward-compatible behavior you get with from __future__ import unicode_literals. In the example below, eval() seems to respect this preference, but ast.literal_eval() seems not to.

from __future__ import unicode_literals, print_function

import ast

raw = r"""   'hello'    """

value = eval(raw.strip())
print(repr(value))
# Prints:
# u'hello'

value = ast.literal_eval(raw.strip())
print(repr(value))
# Prints:
# 'hello'

Note that I'm looking for a general-purpose literal_eval replacement—I don't know in advance that the output is necessarily a string object. I want to be able to assume that raw is the representation of an arbitrary Python literal, which may be a string, or may contain one or more strings, or not.

Is there a way of getting the best of both worlds: a function that both securely evaluates representations of arbitrary Python literals and respects the unicode_literals preference?


Solution

  • Neither ast.literal_eval nor ast.parse offer the option to set compiler flags. You can pass the appropriate flags to compile to parse the string with unicode_literals activated, then run ast.literal_eval on the resulting node:

    import ast
    
    # Not a future statement. This imports the __future__ module, and has no special
    # effects beyond that.
    import __future__
    
    unparsed = '"blah"'
    parsed = compile(unparsed,
                     '<string>',
                     'eval',
                     ast.PyCF_ONLY_AST | __future__.unicode_literals.compiler_flag)
    value = ast.literal_eval(parsed)