I am comparing pairs of strings using six fuzzywuzzy ratios, and I need to output the top three scores for each pair.
This line does the job:
final2_df = final_df[['nameHiringOrganization', 'mesure', 'name', 'valeur']].groupby(['nameHiringOrganization', 'name'])['valeur'].nlargest(3)
However, the excel output table lacks the 'mesure' column, which contains the ratio's name. This is annoying, because then I'm not able to identify which of the six ratios works best for any given pair.
I thought selecting columns ath the beginning might work (final_df[['columns', ...]]), but it doesn't seem to.
Any thought on how I might add that info?
Many thanks in advance!
I think here is possible use another solution with sorting by 3 columns with DataFrame.sort_values
and then using GroupBy.head
:
final2_df = (final_df.sort_values(['nameHiringOrganization', 'name', 'valeur'],
ascending=[True, True, False])
.groupby(['nameHiringOrganization', 'name'])
.head(3))