I have a data frame that has information about buildings, their status and the time when the status is recorded.
For the sake of simplicity I encoded them into numbers in this example table:
id | time | status |
---|---|---|
101 | 1 | 1 |
101 | 2 | 2 |
101 | 3 | 3 |
102 | 1 | 2 |
102 | 2 | 1 |
102 | 3 | 2 |
102 | 4 | 3 |
I want to extract the IDs where their status is not strictly increasing. By that I mean that their status should always increase as time increases just like ID 101
As you can see ID 102 at time 1 has status 2, but at time 2 it has the status 1.
So if you feed this table to the function it should return a list of ID where it only contains 102.
Thank you in advance!
Get differencies per id
in one groupby
in lambda function (for avoid double groupby) for check, if all values are greater like 0
and for list filter indices with inverted mask for values NOT matched condition:
#if necessary sorting per id and time
df = df.sort_values(['id','time'])
s = df.groupby('id')['status'].apply(lambda x: np.all(np.diff(x) > 0))
L = s.index[~s].tolist()
print (L)
[102]
If need rows use GroupBy.transform
for possible filter in boolean indexing
:
#if necessary sorting per id and time
df = df.sort_values(['id','time'])
df1 = df[~df.groupby('id')['status'].transform(lambda x: np.all(np.diff(x) > 0))]
print (df1)
id time status
3 102 1 2
4 102 2 1
5 102 3 2
6 102 4 3