Search code examples
pythonpython-2.7serializationabstract-syntax-tree

How do I debug an error in `ast.literal_eval`?


I wrote my data to a file using pprint.PrettyPrinter and I am trying to read it using ast.literal_eval. This has been working for me for quite some time, and I am reasonably satisfied with the text representation produced.

However, today I got this error on deserialization:

  File "/...mypath.../store.py", line 82, in <lambda>
    reader=(lambda fd: ast.literal_eval(fd.read())),
  File "/usr/lib64/python2.7/ast.py", line 80, in literal_eval
    return _convert(node_or_string)
  File "/usr/lib64/python2.7/ast.py", line 60, in _convert
    return list(map(_convert, node.elts))
  File "/usr/lib64/python2.7/ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "/usr/lib64/python2.7/ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "/usr/lib64/python2.7/ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "/usr/lib64/python2.7/ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "/usr/lib64/python2.7/ast.py", line 79, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

How do I fix this specific file?

The file in question is 17k lines/700kb. I loaded it into Emacs -- the parens are balanced. There are no non-ASCII characters in the file. I can "divide and conquer" (split the file in half and try to read each half) - but this is rather tedious. Is there anything better?

I modified ast.literal_eval:_convert to print the offending node - it turned out to be <_ast.UnaryOp object at 0x110696510>. Not very helpful.

How do I ensure that this does not happen in the future?

I hope JSON is not the answer. ;-)

I am not using JSON because

  1. JSON cannot handle non-string dict keys
  2. JSON inserts either too many newlines or none at all

Solution

  • Quick and Dirty

    Apply this patch:

    --- /...../2.7/lib/python2.7/ast.py.old 2018-03-25 12:17:11.000000000 -0400
    +++ /...../2.7/lib/python2.7/ast.py 2018-03-25 12:17:18.000000000 -0400
    @@ -76,7 +76,7 @@ def literal_eval(node_or_string):
                     return left + right
                 else:
                     return left - right
    -        raise ValueError('malformed string')
    +        raise ValueError('malformed string', node.lineno, node.col_offset)
         return _convert(node_or_string)
     
    

    Reload ast:

    >>> reload(ast)
    

    Retry loading the offending file

    Get

    ValueError: ('malformed string', 21161, 10)
    

    then line 21161, column 10 is where the error is.

    Bug report submitted.

    Sophisticated

    Wrap the code in try/except, catch the error and use inspect/traceback to access the node in question:

    try:
        ast.literal_eval(...)
    except ValueError as ex:
        _exc_type, exc_value, exc_traceback = sys.exc_info()
        print("ERROR: %r" % (exc_value))
        # traceback.print_tb(exc_traceback)
        last_tb = exc_traceback
        while last_tb.tb_next:
            last_tb = last_tb.tb_next
        print("Error location: line=%d, col=%d" % (
            last_tb.tb_frame.f_locals["node"].lineno,
            last_tb.tb_frame.f_locals["node"].col_offset))
    

    prints

    ERROR: ValueError('malformed string')
    Error location: line=21933, col=15