I have a dataframe:
col1 col2 group
10 1 A
11 2 A
12 -2 A
13 -3 B
14 3 B
How do I choose rows if there is a first sign change in col2
by group
.
For instance, for the first unique value of group
('A'), the first sign change (+
to -
) occurs at the third row (col1
=12 and col2
= -2).
Desired result:
col1 col2 group
12 -2 A
14 3 B
data = {
'col1': [10, 11, 12, 13, 14],
'col2': [1, 2, -2, -3, 3],
'group': ['A', 'A', 'A', 'B', 'B']
}
df = pd.DataFrame(data)
import pandas as pd
ind = []
def my_func(x):
if x.loc[x.index[0], 'col2'] < 0:
ind.append(x[x['col2'] >= 0].index[0])
else:
ind.append(x[x['col2'] < 0].index[0])
df.groupby('group').apply(my_func)
df1 = df.loc[ind]
print(df1)
Output
col1 col2 group
2 12 -2 A
4 14 3 B
The dataframe is grouped by the 'group' column. In the my_func function, it is checked if the first number is negative, then we are looking for the first non-negative one, we get its index. In else: opposite condition. Indexes are written to the ind list, which is then substituted into the dataframe to select the desired rows. Explicit loc indexing is used.
Below is a variant with lambda:
bbb = df.groupby('group').apply(
lambda x: x[x['col2'] >= 0].index[0] if x.loc[x.index[0], 'col2'] < 0 else x[x['col2'] < 0].index[0])
df1 = df.loc[bbb.values]
print(df1)