I need to check if the value inside "CP" columns of a dataframe, while my dataframe is grouped by "Centre" column, are canceling each others.
Here is an example : Sample :
Centre | CP |
---|---|
3N7D | 10 |
3N1F | 2 |
3N90 | -9 |
3N7D | -1 |
3N7D | -10 |
8D92 | 198 |
3N7D | -10 |
3N90 | 2 |
8D92 | -195 |
8D92 | -3 |
result excepted :
Centre | CP | Cancel |
---|---|---|
3N7D | 10 | Canceled |
3N1F | 2 | |
3N90 | -9 | |
3N7D | -1 | |
3N7D | -10 | Canceled |
8D92 | 198 | (not canceled with -195 and -3 because I need to cancel it only if there is 1 value that cancel, like -198 in one time) |
3N7D | -10 | (not canceled because the only 10 is already canceled) |
3N90 | 2 | |
8D92 | -195 | |
8D92 | -3 |
Here is what I tried :
grouped_df = df.groupby("Centre")
for name, group in grouped_df:
cp_values = group["CP"].values
for i in range(len(cp_values)):
for j in range(i+1, len(cp_values)):
if cp_values[i] == -cp_values[j]:
print("i = ",i)
print("j = ",j)
print("CP = ",cp_values[i])
print("CP2 = ",cp_values[j])
Thanks for your help.
You can use a custom pivot
to identify the matches of a number with its negative counterpart (obtained with numpy.sign
. You will need to deduplicate the values using groupby.cumcount
for the pivot
:
tmp = (
df.assign(sign=np.sign(df['CP']),
n=df.groupby(['Centre', 'CP']).cumcount(),
abs=df['CP'].abs()
).reset_index()
.pivot(index=['Centre', 'abs', 'n'], columns='sign', values='index')
)
keep = tmp.where(tmp.notna().all(axis=1)).stack().values
df.loc[keep, 'Cancel'] = 'Canceled'
Output:
Centre CP Cancel
0 3N7D 10 Canceled
1 3N1F 2 NaN
2 3N90 -9 NaN
3 3N7D -1 NaN
4 3N7D -10 Canceled
5 8D92 198 NaN
6 3N7D -10 NaN
7 3N90 2 NaN
8 8D92 -195 NaN
9 8D92 -3 NaN
Intermediate tmp
:
sign -1 1
Centre abs n
3N1F 2 0 NaN 1.0
3N7D 1 0 3.0 NaN
10 0 4.0 0.0 # indices 0 and 4 cancel each other
1 6.0 NaN # indice 6 is alone
3N90 2 0 NaN 7.0
9 0 2.0 NaN
8D92 3 0 9.0 NaN
195 0 8.0 NaN
198 0 NaN 5.0