Search code examples
pythondill

Dill deletes object when using "load"


I'm having an error that is driving me nuts. I generate some numerical simulation data sim_data.dill and save it to a directory on my computer using

with open(os.path.join(original_directory, 'sim_data.dill'), 'w' as f:
    dill.dump(outputs, f)

This data is about 1 Gb and takes a while to generate. Now, I copied that file from original_directory to new_directory when I try to load it from a different program using

simfile  = '/new_directory/sim_data.dill'
with open(simfile, 'r') as f:
    outputs = dill.load(f)

One of two things happens:

  1. the program says the file is missing with UnpicklingError: [Errno 2] No such file or directory: .../original_directory/sim_data.dill. This means dill puts in the original_directory in the metadata of the file and refuses to open it when the file is moved; truly appalling behavior.
  2. when I copy the file back to new_directory, trying to open it gives an EOFError and dill changes the file to zero bytes, essentially deleting it. This is even worse.

I can read the file just fine by using a standard with open(simfile, 'r') as f; print f.readlines(), but obviously this does not help when trying to recover the internal class structure of the files.


Solution

  • Apparently this is normal behavior for dill; please see:

    https://github.com/uqfoundation/dill/issues/296

    Paraphrasing: the file location is part of the file handle to be pickled, and so unpickling it without that information is impossible. This means, apparently, that if you save a .dill file in one location, move the file manually (for example to a more convenient directory), and then try to open it again, it won't work.

    In terms of the deletion issue, the author of the post above recommends to use fmode=FMODE_PRESERVEDATA or one of the other file modes listed at https://github.com/matsjoyce/dill/blob/087c00899ef55f31d36e7aee51a958b17daf8c91/dill/dill.py#L136-L145