python pandas duplicates data-analysis data-manipulation

Pandas: Replace/ Change Duplicate values within a Time Range

I have a pandas data-frame where I am trying to replace/ change the duplicate values to 0 (don't want to delete the values) within a certain range of days.

So, in example given below, I want to replace duplicate values in all columns with 0 within a range of let's say 3 (the number can be changed) days. Desired result is also given below

              A   B  C

01-01-2011   2   10  0
01-02-2011   2   12  2
01-03-2011   2   10  0
01-04-2011   3   11  3
01-05-2011   5   15  0
01-06-2011   5   23  1
01-07-2011   4   21  4
01-08-2011   2   21  5
01-09-2011   1   11  0

So, the output should look like

              A   B  C

01-01-2011   2   10  0
01-02-2011   0   12  2
01-03-2011   0   0   0
01-04-2011   3   11  3
01-05-2011   5   15  0
01-06-2011   0   23  1
01-07-2011   4   21  4
01-08-2011   2   0   5
01-09-2011   1   11  0

Any help will be appreciated.

Solution

You can use df.shift() for this to look at a value from a row up or down (or several rows, specified by the number x in .shift(x)).

You can use that in combination with .loc to select all rows that have a identical value to the 2 rows above and then replace it with a 0.

Something like this should work : (edited the code to make it flexible for endless number of columns and flexible for the number of days)

numberOfDays = 3 # number of days to compare

for col in df.columns:
    for x in range(1, numberOfDays):
        df.loc[df[col] == df[col].shift(x), col] = 0

print df

This gives me the output:

            A   B  C
date
01-01-2011  2  10  0
01-02-2011  0  12  2
01-03-2011  0   0  0
01-04-2011  3  11  3
01-05-2011  5  15  0
01-06-2011  0  23  1
01-07-2011  4  21  4
01-08-2011  2   0  5
01-09-2011  1  11  0