Search code examples
pythonpandascomparison

How to flag multiple dataframe rows based on a column value python


I have a dataframe as below:

ID Reviews              Sorted  pairwise         scores
A   This is great         0     [(0, 1)]         [0.26386763883335373]
A   works well            1     []               []
B   can this be changed   0     [(0, 1), (0, 2)] [0.1179287227608669, 0.36815020951152794]
B   how to perform that   1     [(1, 2)]         [0.03299057711398918]
B   summarize it          2     []               []

Sorted would be the order of duplicates in ID. Pairwise would be the pairwise combination grouped by ID. I got the scores column by using the pairwise combination. Now I need to create a flag column such that if the scores > 0.15, then flag 'Yes' based off pairwise column. For instance, when grouped by ID, value B's scores > 0.15 is 0.36, when we look at pairwise column (0,2) i,e 0 and 2 rows should be flagged 'yes'.

my desired output is :

ID Reviews              Sorted  pairwise         scores                                    Flag
A   This is great         0     [(0, 1)]         [0.26386763883335373]                      yes
A   works well            1     []               []                                         yes
B   can this be changed   0     [(0, 1), (0, 2)] [0.1179287227608669, 0.36815020951152794]  yes
B   how to perform that   1     [(1, 2)]         [0.03299057711398918]                      No
B   summarize it          2     []               []                                         yes

I tried to use np.where for the scores, but did not work for me.

can anyone please suggest a work around or any ideas for it ? Thanks in Advance!


Solution

  • We do explode ,then merge it back

    s=df.scores.explode()
    s=df.set_index('ID').pairwise.explode()[(s>0.15).values].explode()
    df=df.merge(s.to_frame('Sorted').reset_index().assign(flag='Yes'),how='left')
    df.flag.fillna('No',inplace=True)
    df
                                          scores          pairwise Sorted ID flag
    0                      [0.26386763883335373]          [(0, 1)]      0  A  Yes
    1                                         []                []      1  A  Yes
    2  [0.1179287227608669, 0.36815020951152794]  [(0, 1), (0, 2)]      0  B  Yes
    3                      [0.03299057711398918]          [(1, 2)]      1  B   No
    4                                         []                []      2  B  Yes