Search code examples
pythonpandasnumpygroup-by

Group values and check if the values ​from another column cancel each other out in python pandas dataframe


I need to check if the value inside "CP" columns of a dataframe, while my dataframe is grouped by "Centre" column, are canceling each others.

Here is an example : Sample :

Centre CP
3N7D 10
3N1F 2
3N90 -9
3N7D -1
3N7D -10
8D92 198
3N7D -10
3N90 2
8D92 -195
8D92 -3

result excepted :

Centre CP Cancel
3N7D 10 Canceled
3N1F 2
3N90 -9
3N7D -1
3N7D -10 Canceled
8D92 198 (not canceled with -195 and -3 because I need to cancel it only if there is 1 value that cancel, like -198 in one time)
3N7D -10 (not canceled because the only 10 is already canceled)
3N90 2
8D92 -195
8D92 -3

Here is what I tried :

grouped_df = df.groupby("Centre")
for name, group in grouped_df:
    cp_values = group["CP"].values
    for i in range(len(cp_values)):
        for j in range(i+1, len(cp_values)):
            if cp_values[i] == -cp_values[j]:
                print("i = ",i)
                print("j = ",j)
                print("CP = ",cp_values[i])
                print("CP2 = ",cp_values[j])

Thanks for your help.


Solution

  • You can use a custom pivot to identify the matches of a number with its negative counterpart (obtained with numpy.sign. You will need to deduplicate the values using groupby.cumcount for the pivot:

    tmp = (
     df.assign(sign=np.sign(df['CP']),
               n=df.groupby(['Centre', 'CP']).cumcount(),
               abs=df['CP'].abs()
              ).reset_index()
       .pivot(index=['Centre', 'abs', 'n'], columns='sign', values='index')
    )
    
    keep = tmp.where(tmp.notna().all(axis=1)).stack().values
    
    df.loc[keep, 'Cancel'] = 'Canceled'
    

    Output:

      Centre   CP    Cancel
    0   3N7D   10  Canceled
    1   3N1F    2       NaN
    2   3N90   -9       NaN
    3   3N7D   -1       NaN
    4   3N7D  -10  Canceled
    5   8D92  198       NaN
    6   3N7D  -10       NaN
    7   3N90    2       NaN
    8   8D92 -195       NaN
    9   8D92   -3       NaN
    

    Intermediate tmp:

    sign           -1    1
    Centre abs n          
    3N1F   2   0  NaN  1.0
    3N7D   1   0  3.0  NaN
           10  0  4.0  0.0  # indices 0 and 4 cancel each other
               1  6.0  NaN  # indice 6 is alone
    3N90   2   0  NaN  7.0
           9   0  2.0  NaN
    8D92   3   0  9.0  NaN
           195 0  8.0  NaN
           198 0  NaN  5.0