Search code examples
pythonpandasgroup-by

generate col of the first index position until the value changes


I have a df of fruits

    fruit
0    apple
1    apple
2    apple
3    banana
4    apple
5    pear

How could I create indexy -- the first index position +1 until the value changes?

    fruit    indexy
0    apple    1
1    apple    1
2    apple    1
3    banana   4
4    apple    5
5    pear     6

Solution

  • Assuming a range index, you could use it, identify the changing points and ffill:

    df['indexy'] = (df.index.to_series().add(1)
                      .where(df['fruit'].ne(df['fruit'].shift()))
                      .ffill().astype(int)
                   )
    

    Or, independently of the index, with rank:

    df['indexy'] = (df['fruit'].ne(df['fruit'].shift()).cumsum()
                    .rank(method='min').astype(int)
                   )
    

    Output:

        fruit  indexy
    0   apple       1
    1   apple       1
    2   apple       1
    3  banana       4
    4   apple       5
    5    pear       6
    

    Intermediates (first approach):

        fruit  index+1  change  where  ffill  astype(int)
    0   apple        1    True    1.0    1.0            1
    1   apple        2   False    NaN    1.0            1
    2   apple        3   False    NaN    1.0            1
    3  banana        4    True    4.0    4.0            4
    4   apple        5    True    5.0    5.0            5
    5    pear        6    True    6.0    6.0            6
    

    Intermediates (second approach):

        fruit  ne(shift)  cumsum  rank(min)  astype(int)
    0   apple       True       1        1.0            1
    1   apple      False       1        1.0            1
    2   apple      False       1        1.0            1
    3  banana       True       2        4.0            4
    4   apple       True       3        5.0            5
    5    pear       True       4        6.0            6