I want to subsample rows of a dataframe such that all pairs of consecutive values in a given column are different, if 2 of them are the same, keep, say, the first one.
Here is an example
p = [1,1,2,1,3,3,2,4,3]
t = range(len(p))
df = pd.DataFrame({'t':t, 'p':p})
df
p t
0 1 0
1 1 1
2 2 2
3 1 3
4 3 4
5 3 5
6 2 6
7 4 7
8 3 8
desiredDf
p t
0 1 0
2 2 2
3 1 3
4 3 4
6 2 6
7 4 7
8 3 8
In desiredDf, all 2 consecutive values in the p column are different.
How about this?
>>> df[df.p != df.p.shift()]
p t
0 1 0
2 2 2
3 1 3
4 3 4
6 2 6
7 4 7
8 3 8
Explanation: df.p.shift()
shifts the entries of column p
down a row. df.p != df.p.shift()
checks that each entry of df.p
is different from the previous entry, returning a boolean value.
This method works on columns with any number of consecutive entries: e.g. if there is a run of three identical values, only the first value in that run is returned.