python pandas dataframe distinct-values subsampling

Pandas - consecutive values must be different

I want to subsample rows of a dataframe such that all pairs of consecutive values in a given column are different, if 2 of them are the same, keep, say, the first one.

Here is an example

p = [1,1,2,1,3,3,2,4,3]
t = range(len(p))
df = pd.DataFrame({'t':t, 'p':p})

df

   p  t
0  1  0
1  1  1
2  2  2
3  1  3
4  3  4
5  3  5
6  2  6
7  4  7
8  3  8



desiredDf

   p  t
0  1  0
2  2  2
3  1  3
4  3  4
6  2  6
7  4  7
8  3  8

In desiredDf, all 2 consecutive values in the p column are different.

Solution

How about this?

>>> df[df.p != df.p.shift()]
   p  t
0  1  0
2  2  2
3  1  3
4  3  4
6  2  6
7  4  7
8  3  8

Explanation: df.p.shift() shifts the entries of column p down a row. df.p != df.p.shift() checks that each entry of df.p is different from the previous entry, returning a boolean value.

This method works on columns with any number of consecutive entries: e.g. if there is a run of three identical values, only the first value in that run is returned.