Search code examples
pythonpandasdataframe

How can I find the first row after a number of duplicated rows?


My DataFrame is:

import pandas as pd
df = pd.DataFrame(
    {
        'x': ['a', 'a', 'a','b', 'b','c', 'c', 'c',],
        'y': list(range(8))
    }
)

And this is the expected output. I want to create column z:

   x  y    z
0  a  0    NaN
1  a  1    NaN 
2  a  2    NaN
3  b  3    3
4  b  4    NaN
5  c  5    NaN
6  c  6    NaN
7  c  7    NaN

The logic is:

I want to find the first row after the first group of duplicated rows. For example in column x, the value a is the first duplicated value. I want to find one row after the a values end. And then put the y of that row for z column.

This is my attempt that did not give me the output:

m = (df.x.duplicated())
out = df[m]

Solution

  • One option, using a custom mask:

    # flag rows after the first group
    m = df['x'].ne(df['x'].iat[0]).cummax()
    
    # pick the first one
    out = df[m & ~m.shift(fill_value=False)]
    

    If your first value is always a and you want to find the first non-a you could also use:

    m2 = df['x'].eq('a')
    
    out = df[m2.shift(fill_value=False) & ~m2]
    

    Or, if you're sure there is at least one row after the leading as:

    out = df.loc[[df['x'].ne('a').idxmax()]]
    

    Output:

       x  y
    3  b  3
    

    Some intermediates (all approaches):

       x  y      m  ~m.shift(fill_value=False)     m2  m2.shift(fill_value=False)  df['x'].ne('a')
    0  a  0  False                        True   True                       False            False
    1  a  1  False                        True   True                        True            False
    2  a  2  False                        True   True                        True            False
    3  b  3   True                        True  False                        True             True
    4  b  4   True                       False  False                       False             True
    5  c  5   True                       False  False                       False             True
    6  c  6   True                       False  False                       False             True
    7  c  7   True                       False  False                       False             True