Search code examples
pythonpandastimecategories

Extract the rows with decreasing values of column of an ordered dataframe


I have a data frame that has information about buildings, their status and the time when the status is recorded.

For the sake of simplicity I encoded them into numbers in this example table:

id time status
101 1 1
101 2 2
101 3 3
102 1 2
102 2 1
102 3 2
102 4 3

I want to extract the IDs where their status is not strictly increasing. By that I mean that their status should always increase as time increases just like ID 101

As you can see ID 102 at time 1 has status 2, but at time 2 it has the status 1.

So if you feed this table to the function it should return a list of ID where it only contains 102.

Thank you in advance!


Solution

  • Get differencies per id in one groupby in lambda function (for avoid double groupby) for check, if all values are greater like 0 and for list filter indices with inverted mask for values NOT matched condition:

    #if necessary sorting per id and time
    df = df.sort_values(['id','time'])
    
    s = df.groupby('id')['status'].apply(lambda x: np.all(np.diff(x) > 0))
    
    L = s.index[~s].tolist()
    print (L)
    [102]
    

    If need rows use GroupBy.transform for possible filter in boolean indexing:

    #if necessary sorting per id and time
    df = df.sort_values(['id','time'])
    
    df1 = df[~df.groupby('id')['status'].transform(lambda x: np.all(np.diff(x) > 0))]
    print (df1)
        id  time  status
    3  102     1       2
    4  102     2       1
    5  102     3       2
    6  102     4       3