Search code examples
pythonpicklenumerical-methods

Frequently Updating Stored Data for a Numerical Experiment using Python


I am running a numerical experiment that requires many iterations. After each iteration, I would like to store the data in a pickle file or pickle-like file in case the program times-out or a data structure becomes tapped. What is the best way to proceed. Here is the skeleton code:

data_dict = {}                       # maybe a dictionary is not the best choice
for j in parameters:                 # j = (alpha, beta, gamma) and cycle through
    for k in number_of_experiments:  # lots of experiments (10^4)
        file = open('storage.pkl', 'ab')
        data = experiment()          # experiment returns some numerical value
                                     # experiment takes ~ 1 seconds, but increase
                                     # as parameters scale
        data_dict.setdefault(j, []).append(data)
        pickle.dump(data_dict, file)
        file.close()

Questions:

  1. Is shelve a better choice here? Or some other python library that I am not aware?
  2. I am using data dict because it's easier to code and more flexible if I need to change things as I do more experiments. Would it be a huge advantage to use a pre-allocated array?
  3. Does opening and closing files affect run time? I do this so that I can check on the progress in addition to the text logs I have set up.

Thank you for all your help!


Solution

    1. Assuming you are using numpy for your numerical experiments, instead of pickle I would suggest using numpy.savez.
    2. Keep it simple and make optimizations only if it you feel that the script runs too long.
    3. Opening and closing files does affect the run time, but having a backup is anyway better.

    And I would use collections.defaultdict(list) instead of plain dict and setdefault.