Search code examples
pythonpandasloopsfor-loopterminate

Does dataframe remain when console is terminated


I have created a Pandas dataframe:

scores = pd.DataFrame(
        {"batch_size" : list(range(64)),
         "learning_rate" : list(range(64)),
         "dropout_rate" : list(range(64)),
         "accuracies" : [[0]]*64,
         "loss" : [[0]]*64,
         "training_time" : list(range(64)),
         }, index = list(range(64)))

Then, in a loop I run 64 models and add the results in the list.

The loop is still ongoing and I dont expect it to be finished before my deadline. Therefore, I would like to terminate the console and continue with the information that has been stored in scores so far. However, I only want to do this if I can still access the dataframe after I terminate the loop.

Can I use the dataframe with intermediate results if I terminate the loop while it's still running?


Solution

    1. If possible, I would prioritize pandas methods rather than using for loops, as this would solve the core problem. Even better, if you are able to change the for loops to pandas methods, and you want even faster execution, then many pandas methods can also be used by a big data python library called dask. That is a little bit more advanced, but I was in a similar position for a large project and dask was a great solution, but it took a day or so to get used to the library and transform my code from pandas to dask.

    2. If you just want to keep your code as is and do this in pandas, then I would look into separating the dataframe into chunks if it is still taking forever to process:

      n = 100000
      scores_df_list = [scores[i:i+n] for i in range(0,scores.shape[0],n)]
      i=0
      for df in scores_df_list:
          i+=1
          #inefficient for loop code on large dataset...
          #inefficient for loop code on large dataset continued...
          df.to_csv(f'file{i}.csv')
      

    See more here from the answer by @ScottBoston and kindly upvote his solution if helpful: Pandas - Slice Large Dataframe in Chunks: