I have a dataframe as below:
ID Reviews Sorted pairwise scores
A This is great 0 [(0, 1)] [0.26386763883335373]
A works well 1 [] []
B can this be changed 0 [(0, 1), (0, 2)] [0.1179287227608669, 0.36815020951152794]
B how to perform that 1 [(1, 2)] [0.03299057711398918]
B summarize it 2 [] []
Sorted would be the order of duplicates in ID. Pairwise would be the pairwise combination grouped by ID. I got the scores column by using the pairwise combination. Now I need to create a flag column such that if the scores > 0.15, then flag 'Yes' based off pairwise column. For instance, when grouped by ID, value B's scores > 0.15 is 0.36, when we look at pairwise column (0,2) i,e 0 and 2 rows should be flagged 'yes'.
my desired output is :
ID Reviews Sorted pairwise scores Flag
A This is great 0 [(0, 1)] [0.26386763883335373] yes
A works well 1 [] [] yes
B can this be changed 0 [(0, 1), (0, 2)] [0.1179287227608669, 0.36815020951152794] yes
B how to perform that 1 [(1, 2)] [0.03299057711398918] No
B summarize it 2 [] [] yes
I tried to use np.where for the scores, but did not work for me.
can anyone please suggest a work around or any ideas for it ? Thanks in Advance!
We do explode
,then merge
it back
s=df.scores.explode()
s=df.set_index('ID').pairwise.explode()[(s>0.15).values].explode()
df=df.merge(s.to_frame('Sorted').reset_index().assign(flag='Yes'),how='left')
df.flag.fillna('No',inplace=True)
df
scores pairwise Sorted ID flag
0 [0.26386763883335373] [(0, 1)] 0 A Yes
1 [] [] 1 A Yes
2 [0.1179287227608669, 0.36815020951152794] [(0, 1), (0, 2)] 0 B Yes
3 [0.03299057711398918] [(1, 2)] 1 B No
4 [] [] 2 B Yes