Search code examples
pythonpickle

How to perform a pickling so that it is robust against crashing?


I routinely use pickle.dump() to save large files in Python 2.7. In my code, I have one .pickle file that I continually update with each iteration of my code, overwriting the same file each time.

However, I occasionally encounter crashes (e.g. from server issues). This may happen in the middle of the pickle dump, rendering the pickle incomplete and the pickle file unreadable, and I lose all my data from the past iterations.

I guess one way I could do it is to save one .pickle file for each iteration, and combine all of them later. Are there any other recommended methods, or best practices in writing to disk that is robust to crashing?


Solution

  • You're effectively doing backups, as your goal is the same: disaster recovery, lose as little work as possible.

    In backups, there are these standard practices, so choose whatever fits you best:

    • backing up
      • full backup (save everything each time)
      • incremental backup (save only what changed since the last backup)
      • differential backup (save only what changed since the last full backup)
    • dealing with old backups
      • circular buffer/rotating copies (delete or overwrite backups older than X days/iterations, optionally change indices in others' names)
      • consolidatating old incremental/differential copies into the preceding full backup (as a failsafe, consolidate into a new file and only then delete the old ones)