Search code examples
pythonpandasdataframegroup-byfind

pandas - find first occurrence


Suppose I have a structured dataframe as follows:

df = pd.DataFrame({"A":['a','a','a','b','b'],
                   "B":[1]*5})

The A column has previously been sorted. I wish to find the first row index of where df[df.A!='a']. The end goal is to use this index to break the data frame into groups based on A.

Now I realise that there is a groupby functionality. However, the dataframe is quite large and this is a simplified toy example. Since A has been sorted already, it would be faster if I can just find the 1st index of where df.A!='a'. Therefore it is important that whatever method that you use the scanning stops once the first element is found.


Solution

  • idxmax and argmax will return the position of the maximal value or the first position if the maximal value occurs more than once.

    use idxmax on df.A.ne('a')

    df.A.ne('a').idxmax()
    
    3
    

    or the numpy equivalent

    (df.A.values != 'a').argmax()
    
    3
    

    However, if A has already been sorted, then we can use searchsorted

    df.A.searchsorted('a', side='right')
    
    array([3])
    

    Or the numpy equivalent

    df.A.values.searchsorted('a', side='right')
    
    3