Search code examples
pythonpandasdataframedistinct-valuessubsampling

Pandas - consecutive values must be different


I want to subsample rows of a dataframe such that all pairs of consecutive values in a given column are different, if 2 of them are the same, keep, say, the first one.

Here is an example

p = [1,1,2,1,3,3,2,4,3]
t = range(len(p))
df = pd.DataFrame({'t':t, 'p':p})

df

   p  t
0  1  0
1  1  1
2  2  2
3  1  3
4  3  4
5  3  5
6  2  6
7  4  7
8  3  8



desiredDf

   p  t
0  1  0
2  2  2
3  1  3
4  3  4
6  2  6
7  4  7
8  3  8

In desiredDf, all 2 consecutive values in the p column are different.


Solution

  • How about this?

    >>> df[df.p != df.p.shift()]
       p  t
    0  1  0
    2  2  2
    3  1  3
    4  3  4
    6  2  6
    7  4  7
    8  3  8
    

    Explanation: df.p.shift() shifts the entries of column p down a row. df.p != df.p.shift() checks that each entry of df.p is different from the previous entry, returning a boolean value.

    This method works on columns with any number of consecutive entries: e.g. if there is a run of three identical values, only the first value in that run is returned.