INPUT DATA:
data = {'G1': ['a', 'a', 'a', 'a', 'a', 'b', 'b'], 'G2': ['b', 'b', 'c', 'c', 'd', 'c', 'c'], 'V1': [5, 15, 10, 20, 5, 10, 10], 'V2': [15, 5, 300, 10, 5, 10, 10]}
| G1 G2 V1 V2
-- + -- -- -- ---
0 | a b 5 15
1 | a b 15 5
2 | a c 10 300
3 | a c 20 10
4 | a d 5 5
5 | b c 10 10
6 | b c 10 10
OUTPUT
| G1 G2 V1 V2 E1 E2
-- + -- -- -- --- -- --
0 | a b 5 15 1 1
1 | a b 15 5 1 1
2 | a c 10 300 1 0
3 | a c 20 10 0 1
4 | a d 5 5 NA NA
5 | b c 10 10 1 1
6 | b c 10 10 1 1
DESCRIPTION:
How can I compute the columns E1 and E2 based on the following conditions:
groups with only one row should be ignored, and have None or NA in E1 and E2
groups with two rows:
if the value in the first row of V1 is equal to the value in the second row of V2, E1 should be 1 (or True) in the first row and E2 in the second.
if the value in the second row of V1 is equal to the value in the first row of V2, E1 should be 1 (or True) in the second row and E2 in the first.
If the values are not equal the respective positions should be 0 (or False).
If the description should not be clear enough, please ask.
Thank you very much.
Probably not the best performance wise, but here is one way using a custom function using np.diag
:
def func(arr):
arr = arr.to_numpy()
if len(arr)<2: return pd.DataFrame([[np.NaN, np.NaN]])
x, y = np.diag(arr), np.diag(np.fliplr(arr))
return pd.DataFrame([[np.all(x==x[0]), np.all(y==y[0])],
[np.all(y==y[0]), np.all(x==x[0])]])
df[["E1", "E2"]] = df.groupby(["G1","G2"])[["V1","V2"]].apply(func).reset_index(drop=True)
print (df)
G1 G2 V1 V2 E1 E2
0 a b 5 15 1.0 1.0
1 a b 15 5 1.0 1.0
2 a c 10 300 1.0 0.0
3 a c 20 10 0.0 1.0
4 a d 5 5 NaN NaN
5 b c 10 10 1.0 1.0
6 b c 10 10 1.0 1.0