Search code examples
pythonpandasdataframedelete-row

Pandas delete rows if value is same for certain date range


I tried looking up for something similar but could find it. So I have below structure dataframe. I looking to delete rows that have same score for 5 days or more. So in below case PeronID AB-123's records from 2/1 to 2/6 needs to be deleted, also for DG-3465 from 2/3 to 2/10. But nothing for TY-9456. I was thinking for using rolling() but that will delete only 2/1-2/5 for AB-123 but not 2/6.

PersonID    Date    Score
AB-123  2/1/2016    0
AB-123  2/2/2016    0
AB-123  2/3/2016    0
AB-123  2/4/2016    0
AB-123  2/5/2016    0
AB-123  2/6/2016    0
AB-123  2/7/2016    67.5
AB-123  2/8/2016    73.4
AB-123  2/9/2016    70.5
AB-123  2/10/2016   68
DG-3465 2/1/2016    22.5
DG-3465 2/2/2016    25.6
DG-3465 2/3/2016    36.4
DG-3465 2/4/2016    36.4
DG-3465 2/5/2016    36.4
DG-3465 2/6/2016    36.4
DG-3465 2/7/2016    36.4
DG-3465 2/8/2016    36.4
DG-3465 2/9/2016    36.4
DG-3465 2/10/2016   36.4
TY-9456 2/1/2016    0
TY-9456 2/2/2016    0
TY-9456 2/3/2016    5.23
TY-9456 2/4/2016    4.12
TY-9456 2/5/2016    5.95
TY-9456 2/6/2016    6.97
TY-9456 2/7/2016    12.45
TY-9456 2/8/2016    15.61
TY-9456 2/9/2016    15.61
TY-9456 2/10/2016   15.61

Tried few different things but I am kinda stuck with nothing poping up in my head. What would you suggest? Using python pandas by the way ;)


Solution

  • You can roll on Score column, calculate the running standard deviation, and then drop rows where the standard deviations are zero along with five rows before them (this assumes you want to delete rows with same scores on consecutive days):

    df.drop(np.unique(df.Score.rolling(5).std()[lambda x: x == 0].index.values - pd.np.arange(5)[:, None]))
    

    enter image description here