I have a df of fruits
fruit
0 apple
1 apple
2 apple
3 banana
4 apple
5 pear
How could I create indexy -- the first index position +1 until the value changes?
fruit indexy
0 apple 1
1 apple 1
2 apple 1
3 banana 4
4 apple 5
5 pear 6
Assuming a range index, you could use it, identify the changing points and ffill
:
df['indexy'] = (df.index.to_series().add(1)
.where(df['fruit'].ne(df['fruit'].shift()))
.ffill().astype(int)
)
Or, independently of the index, with rank
:
df['indexy'] = (df['fruit'].ne(df['fruit'].shift()).cumsum()
.rank(method='min').astype(int)
)
Output:
fruit indexy
0 apple 1
1 apple 1
2 apple 1
3 banana 4
4 apple 5
5 pear 6
Intermediates (first approach):
fruit index+1 change where ffill astype(int)
0 apple 1 True 1.0 1.0 1
1 apple 2 False NaN 1.0 1
2 apple 3 False NaN 1.0 1
3 banana 4 True 4.0 4.0 4
4 apple 5 True 5.0 5.0 5
5 pear 6 True 6.0 6.0 6
Intermediates (second approach):
fruit ne(shift) cumsum rank(min) astype(int)
0 apple True 1 1.0 1
1 apple False 1 1.0 1
2 apple False 1 1.0 1
3 banana True 2 4.0 4
4 apple True 3 5.0 5
5 pear True 4 6.0 6