I am computing the following scores iteratively which generates a new set of the following scores in each iteration:
add_score, keep_score, del_score = get_corpus_sari_operation_scores(sources, prediction, references)
I first want to store them into a file, currently I add them as a tuple to a list and store the list (~9000 lines) in a file:
stat = add_score, keep_score, del_score
stats.append(stat)
f = open("./resources/outputs/generate/stats.txt", "w")
for stat in stats:
print('stat type', type(stat))
f.write(stat)
f.write("\n")
f.close()
the values in the stats.txt file look as follows:
(2.0, 28.25187646117879, 69.96132803170339) (0.0, 23.357228195937875, 50.342178147056195) (1.7241379310344827, 25.888065422949147, 40.21927767354597) (0.0, 47.375201191814064, 16.312725613543307) (1.7857142857142856, 14.565677966101696, 54.81682319618366) (0.0, 63.79656946826759, 9.200422070604626)
What i wanna do is to reaccess this data in another method and read from the file. My goal is to calculate the mean per colum, thus mean(add_score)
, mean(keep_score)
, mean(del_score)
.
However, the values of the file get accessed as tuples/Series.
I tried to convert the tuples into a dataframe to then use the mean()
method per colum but I struggle with the conversion of the tuples to a dataframe.
Does anyone have a better idea on how to handle this data? I wondering if there is a better way to store all scoring results in one file and then calculate the mean per each column.
... struggle with the conversion of the tuples to a dataframe.
You are complaining that the file format is inconvenient. So use the familiar CSV format instead.
import csv
with open("resources/outputs/generate/stats.txt", "w") as f:
sheet = csv.writer(f)
sheet.writerow(('add', 'keep', 'del'))
for stat in stats:
sheet.writerow(stat)
Then later a simple df = pd.read_csv('stats.txt')
should suffice.
Alternatively, assign df = pd.DataFrame(stats, columns=('add', 'keep', 'del'))
and then df.write_csv('stats.txt')
instead
of creating a CSV Writer or DictWriter.