Search code examples
pythonpicklecorruptiondill

How can I recover a corrupted, partially pickled file?


My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.

Is it possible to partially or fully recover the data? If so, how?

Here's what I've tried:

>>> dill.load(open(filename, 'rb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
    obj = pik.load()
EOFError: Ran out of input
>>> 

The file is not empty:

>>> os.stat(filename).st_size
31110059

Note: all data in the dictionary was comprised of python built-in types.


Solution

  • The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:

    import io
    import pickle
    
    # Use the pure-Python version, we can't see the internal state of the C version
    pickle.Unpickler = pickle._Unpickler
    
    import dill
    
    if __name__ == '__main__':
        obj = [1, 2, {3: 4, "5": ('6',)}]
        data = dill.dumps(obj)
    
        handle = io.BytesIO(data[:-5])  # cut it off
    
        unpickler = dill.Unpickler(handle)
    
        try:
            unpickler.load()
        except EOFError:
            pass
    
        print(unpickler.stack)
    

    I get the following output:

    [3, 4, '5', ('6',)]
    

    The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.