Search code examples
pythonfiletuplesmean

How to store a tuple of floats into a file, open and read it to extract the mean per column?


I am computing the following scores iteratively which generates a new set of the following scores in each iteration:

add_score, keep_score, del_score = get_corpus_sari_operation_scores(sources, prediction, references)

I first want to store them into a file, currently I add them as a tuple to a list and store the list (~9000 lines) in a file:

stat = add_score, keep_score, del_score
stats.append(stat)
f = open("./resources/outputs/generate/stats.txt", "w")
    for stat in stats:
        print('stat type', type(stat))
        f.write(stat)
        f.write("\n")
    f.close()

the values in the stats.txt file look as follows:

(2.0, 28.25187646117879,  69.96132803170339) 
(0.0, 23.357228195937875, 50.342178147056195) 
(1.7241379310344827, 25.888065422949147, 40.21927767354597) 
(0.0, 47.375201191814064, 16.312725613543307) 
(1.7857142857142856, 14.565677966101696, 54.81682319618366) 
(0.0, 63.79656946826759, 9.200422070604626)

What i wanna do is to reaccess this data in another method and read from the file. My goal is to calculate the mean per colum, thus mean(add_score), mean(keep_score), mean(del_score).

However, the values of the file get accessed as tuples/Series. I tried to convert the tuples into a dataframe to then use the mean() method per colum but I struggle with the conversion of the tuples to a dataframe.

Does anyone have a better idea on how to handle this data? I wondering if there is a better way to store all scoring results in one file and then calculate the mean per each column.


Solution

  • ... struggle with the conversion of the tuples to a dataframe.

    You are complaining that the file format is inconvenient. So use the familiar CSV format instead.

    import csv
    
    with open("resources/outputs/generate/stats.txt", "w") as f:
        sheet = csv.writer(f)
        sheet.writerow(('add', 'keep', 'del'))
        for stat in stats:
            sheet.writerow(stat)
    

    Then later a simple df = pd.read_csv('stats.txt') should suffice.


    Alternatively, assign df = pd.DataFrame(stats, columns=('add', 'keep', 'del')) and then df.write_csv('stats.txt') instead of creating a CSV Writer or DictWriter.